REAL

Senior AI/ML Engineer - AI Systems Evaluation

REAL  •  Tel Aviv, IL (Hybrid)  •  16 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

REAL is building an AI Execution Platform for real estate organizations.

Today, the data required to run real estate is scattered across fragmented systems, leading to missed insights and preventable financial leakage.

REAL transforms this complexity into connected intelligence and automated execution, enabling enterprises to operate with greater precision and confidence.

REAL Values

  • Ownership We take responsibility and move decisively.
  • Clarity We simplify complexity to deliver meaningful impact.
  • Accuracy Precision matters in everything we build.
  • Velocity We work with urgency and intent.
  • Partnership We collaborate closely with customers and teammates.
  • Own the systems that define, measure, and enforce AI quality at REAL.
  • Translate ambiguous model behavior into measurable signals, automated tests, and release gates.
  • Operate across evaluation design, tooling, and production integration.

What You'll Do

  • Design evaluation architectures (benchmarks, regression suites, coverage)
  • Build automated pipelines to run and score evals across models and prompts
  • Implement scoring systems (LLM-as-judge, rubrics, hybrid approaches)
  • Create and maintain golden datasets + edge-case suites
  • Develop internal tools for prompt testing, dataset generation, experiment tracking
  • Instrument systems for traces, outputs, and debugging
  • Detect regressions and enforce quality gates in CI/CD
  • Monitor model performance in production
  • Close the loop between eval insights and product improvements

Requirements

What We're Looking For

  • 3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling
  • Strong backend and systems engineering fundamentals with hands-on applied AI experience
  • Strong Python, production-level systems experience
  • Built testing frameworks or validation systems end-to-end
  • Hands-on with LLMs / RAG / agent workflows
  • Understands eval methods (benchmarking, A/B, LLM-as-judge, HITL)
  • Experience with observability / logging / experiment tracking
  • Strong systems thinking (coverage, reliability, reproducibility)
  • Comfort with non-deterministic systems

Nice to Have

  • Experience with eval, tracing, observability, or experimentation tooling (one or more of the following: LangSmith, Braintrust, Phoenix, MLflow, OpenTelemetry, PostHog, custom eval stacks)
  • Familiarity with dataset/versioning workflows, HITL systems, and production AI observability systems
  • CI/CD integration for model evaluation
  • Background in search, retrieval, or document systems
  • Built internal platforms or developer tools
  • Experience working in startups and business driven environments
REAL

About REAL

Accelerate your workflows with REAL, an AI-powered Real Estate management platform. It’s time to get REAL. From managing your leases to optimizing your portfolio, REAL puts streamlined workflows at your fingertips. As a result, you can reduce operating costs, maintain a holistic view of your portfolio, and make faster and better decisions.

REAL pairs you with virtual employees for financial analysis, portfolio questions, and much more. Clear answers are just one natural language query away.

Come discover the REAL difference with a free trial today!

Industry
IT & Software
Company Size
11-50 employees
Headquarters
Boston, Massachusetts
Year Founded
Unknown
Website
real.dev
Social Media