Job Description

We are looking for a Senior Architect, Machine Learning to define and lead the architecture for enterprise-grade Generative AI and Agentic AI systems. This is a senior, hands-on architecture role focused on building reliable, scalable, secure, and cost-efficient AI platforms - covering RAG, agent orchestration, inference infrastructure, evaluation/guardrails, and production operations across multiple tenants.

You will work at the intersection of research innovation and engineering reliability: enabling rapid experimentation while ensuring the system runs 24/7 with strong SLOs, governance, and predictable cost.

Architecture & Technical Leadership
- Own the end-to-end architecture for RAG + agentic workflows (Plan → Execute → Verify) across enterprise use cases (contracts, PDFs, knowledge bases).
- Define architecture standards for multi-tenant isolation, API design, service boundaries, and integration patterns.
- Lead technical decision-making: build vs buy, model strategy (hosted vs open-weights), tooling selection, and performance/cost tradeoffs.
- Drive architecture reviews, mentor engineers/researchers, and raise the overall bar for engineering quality and research rigor.
RAG & Retrieval Systems (Enterprise-grade)
- Design retrieval pipelines that optimize grounded accuracy: chunking strategy, hybrid retrieval, reranking, query rewriting, and context construction.
- Define document ingestion patterns (PDF parsing, OCR, structured extraction, metadata enrichment) and index lifecycle strategies.
- Establish retrieval evaluation and regression frameworks (ground truth, offline/online evaluation, drift tracking).
Enable async and event-driven architectures for long-running tasks using queues/streams (Kafka/RabbitMQ/Redis Streams) and/or durable workflow engines (Temporal).
Inference & Platform Engineering
- Architect model serving for high throughput and low latency using engines like vLLM / TGI / Triton / TorchServe (as applicable).
- Define GPU orchestration and capacity strategy on Kubernetes (AKS/EKS/GKE), including scale-to-zero, scheduling, and quota-based governance.
- Design platform-level controls for rate limiting, caching, backpressure, and cost containment (tenant quotas, token budgets, throttling).
Safety, Guardrails, Security & Compliance
- Own guardrail architecture for prompt injection defense, tool safety, policy enforcement, and PII handling (redaction patterns).
- Define secure-by-default patterns: secrets management, data protection, audit logs, and safe prompt/tool execution boundaries.
- Partner with security/compliance teams to meet enterprise standards (e.g., SOC2/GDPR expectations where relevant).
Observability, Reliability & Operational Excellence
- Establish SLOs and production readiness standards: error budgets, runbooks, incident response patterns.
- Define observability strategy across LLM calls and agent tools: tracing, metrics, logs, cost dashboards, and token usage reporting.
- Build reliability patterns for dependency failure (model provider downtime, throttling): circuit breakers, fallbacks, degradation strategies.

Required Qualifications

13+ years of experience in ML systems / platform engineering / architecture roles, with ownership of production-grade systems.
Strong software engineering fundamentals: APIs, distributed systems patterns, testing, versioning, CI/CD, and operational readiness.
Hands-on experience with Kubernetes and Docker and cloud-native design (Azure/AWS/GCP).
Strong experience designing event-driven and async architectures with durable execution patterns (queues/workflows).
Proven ability to lead architecture for complex systems involving ML/LLMs, data pipelines, and multi-service integration.
Strong Python proficiency; comfortable with async patterns and structured validation (e.g., Pydantic-style design).

Preferred Qualifications

Deep experience with RAG (retrieval + grounding + reranking) and evaluation techniques for hallucinations and answer quality.
Experience with agent frameworks and multi-step tool execution patterns (plan/execute/verify, tool routing, loop prevention).
Experience with open-weight models and adaptation methods (e.g., PEFT/LoRA), plus evaluation-driven iteration.
Experience with model inference optimization (throughput, batching, caching) and GPU efficiency management.
Experience operating observability stacks (OpenTelemetry, Prometheus/Grafana, Datadog) and LLM tracing tools.

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

About Icertis

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, 30% of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

Industry

IT & Software

Company Size

1,001-5,000 employees

Headquarters

Bellevue, WA

Year Founded

2009

Website

icertis.com

Social Media