Softgic

Agent Quality / Evals Engineer 1754

Softgic  •  Remote  •  2 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description


This is a remote position.


Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

• Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.

• Wire evals into CI so quality regressions fail builds and releases.

• Define and maintain release-gate thresholds with Product and the Tech Lead.

• Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.


Requirements


Must-Have Qualifications

• Experience evaluating ML, LLM, or non-deterministic systems.

• Strong test and benchmark design capability.

• Comfort working with noisy metrics, thresholds, and probabilistic behavior.

• Good scripting and automation skills.

AI-First Expectations

• Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.

• Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

• The first reference agent has a published scorecard and gated eval path. • Golden and exception tests run automatically. • The team can explain what “good enough to ship” means in measurable terms.
Softgic

About Softgic

We are a young and growing company, with operations in Medellin and Bogota, focused on the generation of technological solutions in synergy with our customers and our team so that these solutions add value within their organizations and their business processes.

Industry
IT & Software
Company Size
51-200 employees
Headquarters
Unknown
Year Founded
2011
Social Media