Alice builds adversarial evaluation infrastructure used by the world's leading AI labs to stress-test their most capable agents before deployment. We're hiring interns for our RL Gym environments: adversarial training grounds where AI agents face prompt injection, social engineering, and data exfiltration scenarios.
The Role: You design prompt injection scenarios in YAML, run them against frontier models, validate success rates, and submit passing tasks.
The Task
You build adversarial prompt injection tasks for Alice's RL Gym platform. Each task is a self-contained YAML scenario simulating a realistic AI agent deployment, testing whether the agent can be manipulated into violating its safety policies.
What a task includes:
What We're Looking For
What We Offer
If you’re eager to learn, innovate, and grow in the field of data engineering, we’d love to hear from you. Apply today to be part of a team that values creativity and technical excellence!
Please note that only shortlisted candidates will be contacted.
ActiveFence is the leading tool stack for Trust & Safety teams, worldwide. By relying on ActiveFence’s end-to-end solution, Trust & Safety teams – of all sizes – can keep users safe from the widest spectrum of online harms, unwanted content, and malicious behavior, including child safety, disinformation, fraud, hate speech, terror, nudity, and more.
Using cutting-edge AI and a team of world-class subject-matter experts to continuously collect, analyze, and contextualize data, ActiveFence ensures that in an ever-changing world, customers are always two steps ahead of bad actors. As a result, Trust & Safety teams can be proactive and provide maximum protection to users across a multitude of abuse areas in 70+ languages.
Backed by leading Silicon Valley investors such as CRV and Norwest, ActiveFence has raised $100M to date; employs 300 people worldwide, and has contributed to the online safety of billions of users across the globe.

ActiveFence is the leading provider of AI security and safety solutions, protecting online experiences and AI applications for over 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms.
As a trusted partner to major technology companies and Fortune 500 brands, we secure user-generated and GenAI products against prompt injection, adversarial attacks, and harmful content through Real-Time Guardrails, continuous Red Teaming, and the industry’s most advanced threat intelligence.
With unmatched detection capabilities in 117+ languages, ActiveFence empowers organizations to deliver engaging, safe, and trustworthy experiences globally, helping them innovate responsibly while staying ahead of emerging threats.