Machine Learning Evaluation Specialist (Remote)
List of accepted countries and locations
Important for US applicants: This is a 1099 independent contractor role and is not compatible with F-1 OPT, STEM OPT, or other visa statuses that require W-2 employment, guaranteed hours, or employer sponsorship. We are unable to provide offer letters or employment verification for this role.
Help design the hardest ML problems state-of-the-art AI hasn't solved yet.
We're hiring domain experts to build evaluation tasks that challenge the frontier of AI. This is not an ML engineering role — it's a research role. You'll use deep expertise in your field to create problems that general ML knowledge can't touch.
What you'll do
Propose and frame original, research-grade ML problems rooted in your domain
Design evaluation tasks that require specialized knowledge well beyond standard pipelines
Assess AI-generated solutions for correctness, creativity, and methodological rigor — and explain exactly where and why they fall short
Document problem difficulty, required domain knowledge, and expected failure modes
What you need
Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain that intersects with ML
Strong working knowledge of ML methods — model selection, feature engineering, evaluation metrics
Deep familiarity with active research problems in your field — you know where general ML knowledge runs out
Excellent written communication — you can articulate complex problems clearly and precisely. This cannot be overstated.
Self-motivated and comfortable working independently on intellectually demanding tasks
What you don't need
No prior AI training or RLHF experience required
No software engineering background needed — domain expertise and research instincts are what matter
Domains we're especially looking for
Computational Biology / Bioinformatics
Genomics / Molecular Biology
Physics / Astrophysics / Signal Processing
Climate / Environmental Modeling
Healthcare / Medical Imaging
Neuroscience / Brain-Computer Interfaces
Materials Science / Chemistry
Finance / Quantitative Modeling
Robotics / Control Systems / Reinforcement Learning
Advanced NLP (specialized domains)
Mathematics / Statistics (applied)
Logistics
Fully remote — work from anywhere
$200–$400/hr depending on domain and seniority
10–40 hrs/week, hourly contract
Assessment required — paid if approved
Independent contractor (1099) — not compatible with F-1 OPT, STEM OPT, or visa statuses requiring W-2 employment or employer sponsorship
â ï¸ This is a project-based, freelance opportunity with no guaranteed hours. We recommend keeping other work options open while waiting for project assignment.

G2i is a hiring community connecting remote developers with world-class engineering teams. Our unique approach combines rigorous technical assessments with a solid commitment to developer health, ensuring companies get skilled developers who are supported, valued, and ready to execute from day one.
Our transparent vetting process includes in-depth, performance-ranked developer profiles, recorded technical interviews, and soft-skills assessments. Whether you're working on a short-term project or burning down a backlog, G2i connects you with a community of pre-vetted developers.
Planning to hire ten or more engineers? We create a Custom Talent Pipeline, allowing for specific customizations in sourcing, assessment criteria, technical interview questions, and integration with your existing HR systems and processes.
G2i partners with clients who support the developer health mission—matching developers with environments that improve their health, support recovery from burnout, and enable professional growth through restful work.
Is your team overworked or understaffed? Contact us today to learn how G2i can help you.
More information about our mission and commitment to developers and clients can be found at https://g2i.co or follow us on X @g2i_co