Job Description
This role is for one of our clients
Compensation: $45-$100 per hour
We are building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains. This role focuses on designing rigorous, research-grade computational problems that assess how effectively AI systems can leverage real scientific software tools to solve complex challenges.
Unlike traditional annotation roles, this position requires creating original, graduate-level problems rooted in real-world scientific workflows. You will iteratively refine these problems through calibration against state-of-the-art AI models, ensuring the right balance of difficulty, depth, and reasoning complexity.
Requirements
What You’ll Do
- Design advanced computational problems requiring the use of domain-specific scientific software
- Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data)
- Develop problem setups, solution pathways, and validation mechanisms
- Calibrate and refine tasks based on model performance to achieve target difficulty levels
- Ensure problems emphasize reasoning strategy over brute-force computation
Domains & Tools of Interest
We are particularly seeking candidates with hands-on experience in:
- Bioinformatics & Single-Cell Genomics: scanpy, scvelo, squidpy, gudhi (RNA-seq, trajectory inference, spatial transcriptomics)
- Computational Chemistry: PySCF (HF, DFT, TDDFT, CASSCF, post-HF methods)
- Particle & Nuclear Physics: scikit-hep, Monte Carlo simulations, collider data analysis
- Electrical Engineering: scikit-rf, ngspice (RF systems, circuit simulation)
- Astrophysics & Cosmology: astropy (cosmological modeling, survey analysis)
- Structural & Mechanical Engineering: scikit-fem (finite element analysis, elasticity, beam theory)
- Seismology & Geophysics: ObsPy, SPECFEM (waveform analysis, inversion, tomography)
- Pharmacokinetics & Systems Biology: libRoadRunner, Tellurium, SBML-based tools
Experience with other specialized tools in related domains is also welcomed.
What Makes You a Strong Fit
- Graduate-level expertise (MS or PhD preferred) in a relevant STEM field
- Hands-on experience using scientific software libraries for real research problems
- Strong Python programming skills, including building computational workflows and validators
- Ability to design challenging problems that require deep reasoning rather than surface-level solutions
- Familiarity with edge cases, limitations, and practical challenges of scientific tools
Requirements
- Demonstrated proficiency with at least one relevant scientific library (via research, open-source work, or industry experience)
- Ability to work independently and iterate based on feedback
- Comfort working in Linux/terminal environments and remote compute setups
- Availability of at least 15–20 hours per week
Nice to Have
- Experience across multiple domains or tools
- Background in evaluation frameworks or benchmarking
- Experience in teaching, pedagogy, or problem-set design
- Familiarity with reproducible research practices and containerized environments
Engagement Details
- Independent contractor role
- Fully remote with flexible scheduling
- Project scope may evolve based on performance and research needs
Compensation & Payments
- Competitive compensation based on expertise and domain specialization
- Weekly payments via supported global payment platforms
Additional Information
- Work must not involve sharing confidential or proprietary information from any current or past employer or institution
- Projects may be extended, modified, or concluded based on performance and business requirements
- This opportunity does not currently support certain work authorization categories