Job Description
Skills: Python, Torch/PyTorch, Machine Learning, Reinforcement learning (RL), Natural Language Processing, Computer Vision, LLMs, Transformers, Fine-tuning
The Role
Frontier labs post-train on the open internet. We post-train on data the internet has never seen — handwritten deeds in six languages, regional-dialect voice conversations, survey maps and satellite imagery, decades of skewed scans. Off-the-shelf models, even the best ones, fail on this distribution. Closing that gap is the job.
Terra is a multi-agent system where models converse with users in voice and text, read documents and site photos, reason over video and GIS data, and make judgment calls that lenders and families act on. You'll own post-training across this surface: adapting open-weight models — language, vision-language, speech, and beyond — to the modalities and behaviors Terra needs, end to end from data curation through training, evals, and deployment.
This is hands-on training work in production, not prompt engineering with extra steps.
What You'll Work On
-
Multimodal post-training. SFT and preference optimization (DPO/GRPO-class methods) on open-weight models across text, vision, and speech — document understanding today; voice agents, video, and geospatial reasoning on the near-term roadmap.
-
Agentic behavior tuning. Training models to be good agents, not just good predictors — tool use, multi-step reasoning, code-switching across Indian languages, knowing when to escalate versus answer, and grounding every claim in evidence.
-
Provenance-grounded outputs. Structured generation where answers carry their evidence — schema design, constrained decoding, and training strategies that make models cite rather than hallucinate. In our domain, a confident wrong answer costs someone their home.
-
Speech and voice adaptation. Adapting ASR/TTS and speech-LLM models for Indian languages, accents, and the messy acoustics of real phone calls — so Terra can serve users who will never type.
-
GIS and visual grounding. Teaching models to reason over survey maps, plot boundaries, and satellite imagery, and to reconcile them with what documents claim.
-
Training data as a product. Annotation taxonomy design, mining production transcripts and failures into training data, and building the flywheel that compounds quality across every modality.
-
Evals that gate releases. Evaluation suites per modality and per agent behavior, with regional quality floors — a model that works in Telangana but breaks in Karnataka never ships.
What We're Looking For
- 3–7 years in ML engineering with hands-on post-training experience: you've personally run SFT or preference-optimization jobs on open-weight models and shipped the result, not just read the papers.
- Depth in at least one modality beyond text — vision-language, speech, video, or geospatial — and the appetite to expand into the others. Nobody arrives knowing all of them; we care that you've gone deep once and can do it again.
- Fluency with the modern training stack: PyTorch, Hugging Face ecosystem (transformers, PEFT, TRL or similar), experiment tracking, GPU training workflows.
- Strong evaluation instincts — you design the eval before the training run, and you explain regressions from error analysis, not vibes.
- Data-centric mindset: you know most post-training wins come from better data, and you have the tooling sense to build curation pipelines.
- Solid Python and the engineering discipline to make training reproducible.
Nice to Have
- Experience tuning models for agentic behavior: tool use, function calling, multi-turn dialogue, or RL on agent trajectories.
- VLM fine-tuning (Qwen, Nemotron, Gemma class) or speech-model adaptation (Whisper-class ASR, speech LLMs, TTS) for Indic languages.
- DPO/RLHF/GRPO experience, reward modeling, or preference-data design.
- Inference optimization: vLLM, quantization, multi-adapter serving.
- Remote sensing, GIS, or video understanding background.
- Domains where provenance and correctness are non-negotiable — legal, fintech, healthcare.
Your First 90 Days
Days 1–30: Ground truth. Ship an improvement to a production model in your first two weeks. Read documents, listen to call recordings, study failure transcripts until you understand why this data breaks pretrained models. Own quality for one model surface end to end.
Days 31–60: Own a modality. Take full ownership of one post-training track — document VLMs, voice, or agent behavior — including its data pipeline, eval suite, and a measurable quality lift shipped to production.
Days 61–90: Shape the stack. Make the call on a structural bet — preference optimization rollout, a new modality's training approach, adapter consolidation — backed by evidence from your first 60 days. By now you should be setting post-training direction across Terra, not just executing it.
Why This Role
- Post-training scope that frontier labs split across whole teams — text, vision, speech, and agents, all yours to shape.
- A data distribution nobody else has: your work can't be replicated by anyone scraping the internet.
- Direct stakes — Terra's judgments decide whether property purchases are safe, with lenders and families relying on the output.
- Small, senior team; backed by Y Combinator and top investors, with real revenue and real users.
Interview Process
We move fast — the full loop takes 5–7 days, and we'll give you a decision within 48 hours of your final round.
-
Intro call (30 min). Mutual fit, your background, and a walkthrough of the problem space.
-
Post-training deep-dive (60 min). A fine-tuning project you ran end to end: data decisions, training setup, what the evals caught, what you'd redo. We're probing for hands-on depth.
-
Work sample (take-home or paired, your choice, ~3 hrs). Design a post-training and eval approach for a real Terra problem in your strongest modality — data strategy, training plan, and how you'd know it worked.
-
Systems + eval round (60 min). Design the full post-training loop for a multi-agent, multimodal system: curation → training → eval gates → serving. Expect pushback.
-
Founder conversation (45 min). Values, ambition, and your questions about where the company is going.