Icertis

Senior Architect

Icertis  •  Pune, IN (Hybrid)  •  5 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

We are looking for a Senior Architect, Machine Learning to define and lead the architecture for enterprise-grade Generative AI and Agentic AI systems. This is a senior, hands-on architecture role focused on building reliable, scalable, secure, and cost-efficient AI platforms - covering RAG, agent orchestration, inference infrastructure, evaluation/guardrails, and production operations across multiple tenants.

You will work at the intersection of research innovation and engineering reliability: enabling rapid experimentation while ensuring the system runs 24/7 with strong SLOs, governance, and predictable cost.

  • Architecture & Technical Leadership

    • Own the end-to-end architecture for RAG + agentic workflows (Plan → Execute → Verify) across enterprise use cases (contracts, PDFs, knowledge bases).
    • Define architecture standards for multi-tenant isolation, API design, service boundaries, and integration patterns.
    • Lead technical decision-making: build vs buy, model strategy (hosted vs open-weights), tooling selection, and performance/cost tradeoffs.
    • Drive architecture reviews, mentor engineers/researchers, and raise the overall bar for engineering quality and research rigor.
  • RAG & Retrieval Systems (Enterprise-grade)
    • Design retrieval pipelines that optimize grounded accuracy: chunking strategy, hybrid retrieval, reranking, query rewriting, and context construction.
    • Define document ingestion patterns (PDF parsing, OCR, structured extraction, metadata enrichment) and index lifecycle strategies.
    • Establish retrieval evaluation and regression frameworks (ground truth, offline/online evaluation, drift tracking).
  • Enable async and event-driven architectures for long-running tasks using queues/streams (Kafka/RabbitMQ/Redis Streams) and/or durable workflow engines (Temporal).

  • Inference & Platform Engineering
    • Architect model serving for high throughput and low latency using engines like vLLM / TGI / Triton / TorchServe (as applicable).
    • Define GPU orchestration and capacity strategy on Kubernetes (AKS/EKS/GKE), including scale-to-zero, scheduling, and quota-based governance.
    • Design platform-level controls for rate limiting, caching, backpressure, and cost containment (tenant quotas, token budgets, throttling).
  • Safety, Guardrails, Security & Compliance
    • Own guardrail architecture for prompt injection defense, tool safety, policy enforcement, and PII handling (redaction patterns).
    • Define secure-by-default patterns: secrets management, data protection, audit logs, and safe prompt/tool execution boundaries.
    • Partner with security/compliance teams to meet enterprise standards (e.g., SOC2/GDPR expectations where relevant).
  • Observability, Reliability & Operational Excellence
    • Establish SLOs and production readiness standards: error budgets, runbooks, incident response patterns.
    • Define observability strategy across LLM calls and agent tools: tracing, metrics, logs, cost dashboards, and token usage reporting.
    • Build reliability patterns for dependency failure (model provider downtime, throttling): circuit breakers, fallbacks, degradation strategies.

Required Qualifications

  • 13+ years of experience in ML systems / platform engineering / architecture roles, with ownership of production-grade systems.

  • Strong software engineering fundamentals: APIs, distributed systems patterns, testing, versioning, CI/CD, and operational readiness.

  • Hands-on experience with Kubernetes and Docker and cloud-native design (Azure/AWS/GCP).

  • Strong experience designing event-driven and async architectures with durable execution patterns (queues/workflows).

  • Proven ability to lead architecture for complex systems involving ML/LLMs, data pipelines, and multi-service integration.

  • Strong Python proficiency; comfortable with async patterns and structured validation (e.g., Pydantic-style design).

Preferred Qualifications

  • Deep experience with RAG (retrieval + grounding + reranking) and evaluation techniques for hallucinations and answer quality.

  • Experience with agent frameworks and multi-step tool execution patterns (plan/execute/verify, tool routing, loop prevention).

  • Experience with open-weight models and adaptation methods (e.g., PEFT/LoRA), plus evaluation-driven iteration.

  • Experience with model inference optimization (throughput, batching, caching) and GPU efficiency management.
    Experience operating observability stacks (OpenTelemetry, Prometheus/Grafana, Datadog) and LLM tracing tools.

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

Icertis

About Icertis

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, 30% of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

Industry
IT & Software
Company Size
1,001-5,000 employees
Headquarters
Bellevue, WA
Year Founded
2009
Social Media