
Senior AI Testing Engineer (Generative AI)
Location India Remote / Hybrid / In-office — [specify your actual working
model here]
Experience 5–8 years total experience in software testing, QA engineering,
or SDET roles, with at least 2–3 years of meaningful, hands-on exposure to
Generative AI systems, LLM applications, or AI quality engineering.
Role Overview
We are looking for a Senior AI Testing Engineer to own quality across our
Generative AI products and platform.
This role is fundamentally about engineering quality into AI systems — not
running test scripts. You'll design evaluation frameworks, build automated
testing pipelines, and define what "good" looks like for LLM outputs,
RAG systems, AI agents, and voice AI applications. You'll work directly with AI
engineers and product teams to make sure our systems are reliable, safe, and
measurably improving over time.
If you understand how LLMs fail, know how to catch hallucinations before
users do, and want to build the quality infrastructure that underpins
production AI at scale — this is the role.
Key Responsibilities
Evaluation Strategy & Frameworks
·
Design
and own comprehensive testing strategies for Generative AI products — including
LLM applications, RAG pipelines, AI agents, voice AI systems, and workflow
automation
·
Define
evaluation methodologies covering functional testing, response quality,
hallucination detection, safety and guardrail testing, prompt injection, bias
and toxicity, retrieval quality, latency benchmarking, and agent workflow
validation
·
Build
reusable AI testing frameworks and automation pipelines for continuous
evaluation
·
Create
datasets, benchmark suites, and golden test sets for GenAI evaluation
Automated Evaluation
·
Develop
automated evaluation pipelines using LLM-as-a-Judge and hybrid evaluation
methods
·
Implement
CI/CD-integrated AI evaluation pipelines
·
Drive
observability and monitoring strategies for production AI systems
Quality Standards & Collaboration
·
Define
measurable quality KPIs for AI systems
·
Establish
testing standards, best practices, and governance processes for GenAI
applications
·
Work
closely with AI engineers, product, and platform teams to embed quality
throughout the development lifecycle
Required Skills & Experience
Testing & Engineering Experience
·
5–8
years in software testing, QA engineering, SDET, or test automation
·
2–3
years of hands-on experience testing or evaluating production-grade Generative
AI or LLM-based systems
·
Strong
test automation skills in Python
·
Experience
designing scalable automated testing frameworks
·
Familiarity
with API testing, integration testing, and performance testing
Generative AI Knowledge
·
Solid
understanding of how LLM systems work — and how they fail
·
Experience
with RAG architectures, prompt engineering, AI agents, embedding models, and
vector databases
·
Understanding
of LLM evaluation methodologies and AI system failure modes
GenAI Testing Frameworks
·
Hands-on
experience with at least one or more GenAI evaluation frameworks, such as:
DeepEval, Ragas, LangSmith, Promptfoo, TruLens, OpenAI Evals, or LangChain
evaluation tools
Quality Engineering
·
Expertise
in test strategy, test planning, test automation architecture, defect lifecycle
management, and quality metrics
·
Ability
to define and track measurable quality KPIs for AI systems
Preferred Qualifications
·
Experience
with cloud platforms (AWS, Azure, or GCP)
·
Familiarity
with MLOps / LLMOps workflows
·
Experience
with CI/CD pipelines and DevOps practices
·
Exposure
to monitoring and observability tooling for AI systems
·
Understanding
of security and compliance for GenAI products
·
Experience
with conversational AI or voice AI systems

Founded in 2001 by Silicon Valley entrepreneur Dr. Romesh Wadhwani, Wadhwani Foundation is a global non-profit committed to accelerating job growth in emerging economies and enabling millions to earn a family-sustaining wage and lead a dignified life: through its four core initiatives: Entrepreneurship, Skilling, Innovation & Research, and Government Digital Transformation.