Tracer is an early-stage, venture-backed startup building the open source AI SRE agent for production engineering ( https://github.com/Tracer-Cloud/opensre) that automatically investigates production incidents across the entire production stack.
We are backed by experienced operators and investors who believe AI for production systems will be one of the defining enterprise software categories of the next decade. Our team is small, senior, and highly execution-focused.
We are opening up engineering roles to earlier-career engineers who are hungry to work on hard problems. This is a hands-on role where you will contribute directly to the OpenSRE codebase, build integrations used by engineers worldwide, and work closely with the founding team.
If you have shipped real projects, contributed to open source, or built side projects, we want to hear from you.
Python + LangGraph (for multi-agentic alert investigation)
Rust (because we like systems that are fast and correct)
ClickHouse (high-volume event + investigation history at scale)
AWS + Terraform (infrastructure that builds itself)
Build and maintain integrations connecting OpenSRE to tools like Grafana, Datadog, Kafka, Airflow and more
Contribute to the core agent: alert ingestion, context enrichment, hypothesis execution, and RCA reporting
Write tests and synthetic incident simulations for the RL environment
Review and respond to community pull requests and issues
Work directly with senior engineers and founders to ship features fast
Attend engineering conferences globally to connect with fellow engineers and grow awareness of the OpenSRE open source project
0-2 years of professional experience, or strong project/extracurricular experience as a graduate
Comfortable in Python
Has shipped something real, a side project, open source contribution, university project, or anything you built and put in front of users
Curious about distributed systems, observability, or AI infrastructure
Bonus: prior open source contributions or experience with LangGraph, Grafana, Datadog, or cloud infrastructure
Total Compensation Range: up to £70,000 (incl. equity)
We structure compensation as follows:
Competitive base salary
Meaningful equity ownership with real upside
Final package depends on experience, impact, and seniority
What’s included:
Salary + equity
30 days annual leave
Employee health insurance
Visa sponsorship
Weekly team dinners and socials
Regular team offsites and trips (our most recent was Kenya 🇰🇪)
Introductory Call (15-30 mins): Call with our founding team to discuss your background, motivations, and learn more about Tracer
Role Fit Interview (45 mins): Meet with our lead engineer or technical co-founder to review your working style, skills, and fit for the role
Take-home & Competency Deep Dive (1 hour): Complete a practical exercise (e.g., case study, presentation, or technical problem-solving) to explore the role's responsibilities and expectations
On-site meetup (Half Day): On-site interviews and team lunch at our headquarters to ask any questions and experience our office and culture firsthand
Offer Final decision and offer

Tracer is the first pipeline monitoring system purpose-built for high-compute workloads that lives in the OS. Tailored towards biotech and pharma.
Traditional tools lack the depth, breadth, and clarity required to keep up with the explosive workload growth. Leaving teams with blind spots, runaway costs, and endless debugging.
Tracer closes this gap by:
--> Capturing OS-level signals across every node
--> Reconstructing complete workflows across machines
--> Pinpointing root causes instantly without endless log-hunting
We integrate seamlessly into your existing infrastructure with a single click and combine this data to optimise pipelines, debug faster, and attribute real-time costs.
Result: scientists spend less time stuck in logs and more time advancing discovery.