Job Description

This role is for one of our clients

Compensation: $45-$100 per hour

We are building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains. This role focuses on designing rigorous, research-grade computational problems that assess how effectively AI systems can leverage real scientific software tools to solve complex challenges.

Unlike traditional annotation roles, this position requires creating original, graduate-level problems rooted in real-world scientific workflows. You will iteratively refine these problems through calibration against state-of-the-art AI models, ensuring the right balance of difficulty, depth, and reasoning complexity.

Requirements

What You’ll Do

Design advanced computational problems requiring the use of domain-specific scientific software
Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data)
Develop problem setups, solution pathways, and validation mechanisms
Calibrate and refine tasks based on model performance to achieve target difficulty levels
Ensure problems emphasize reasoning strategy over brute-force computation

Domains & Tools of Interest
We are particularly seeking candidates with hands-on experience in:

Bioinformatics & Single-Cell Genomics: scanpy, scvelo, squidpy, gudhi (RNA-seq, trajectory inference, spatial transcriptomics)
Computational Chemistry: PySCF (HF, DFT, TDDFT, CASSCF, post-HF methods)
Particle & Nuclear Physics: scikit-hep, Monte Carlo simulations, collider data analysis
Electrical Engineering: scikit-rf, ngspice (RF systems, circuit simulation)
Astrophysics & Cosmology: astropy (cosmological modeling, survey analysis)
Structural & Mechanical Engineering: scikit-fem (finite element analysis, elasticity, beam theory)
Seismology & Geophysics: ObsPy, SPECFEM (waveform analysis, inversion, tomography)
Pharmacokinetics & Systems Biology: libRoadRunner, Tellurium, SBML-based tools

Experience with other specialized tools in related domains is also welcomed.

What Makes You a Strong Fit

Graduate-level expertise (MS or PhD preferred) in a relevant STEM field
Hands-on experience using scientific software libraries for real research problems
Strong Python programming skills, including building computational workflows and validators
Ability to design challenging problems that require deep reasoning rather than surface-level solutions
Familiarity with edge cases, limitations, and practical challenges of scientific tools

Requirements

Demonstrated proficiency with at least one relevant scientific library (via research, open-source work, or industry experience)
Ability to work independently and iterate based on feedback
Comfort working in Linux/terminal environments and remote compute setups
Availability of at least 15–20 hours per week

Nice to Have

Experience across multiple domains or tools
Background in evaluation frameworks or benchmarking
Experience in teaching, pedagogy, or problem-set design
Familiarity with reproducible research practices and containerized environments

Engagement Details

Independent contractor role
Fully remote with flexible scheduling
Project scope may evolve based on performance and research needs

Compensation & Payments

Competitive compensation based on expertise and domain specialization
Weekly payments via supported global payment platforms

Additional Information

Work must not involve sharing confidential or proprietary information from any current or past employer or institution
Projects may be extended, modified, or concluded based on performance and business requirements
This opportunity does not currently support certain work authorization categories

About Weekday (YC W21)

AI recruiter that runs outbound sourcing campaigns to find top talent. At Weekday, we have built the most accurate database of talent (250mn+ people in US & India with contact data), we run outbound campaigns to identify top talent for any role you might be hiring for. We generate the highest response rates (30-40%) on our campaigns making sourcing talent as easy as making a job posting. We are backed by Y-Combinator and were also ranked #1 on Product Hunt.

Industry

Consulting & Advisory

Company Size

51-200 employees

Headquarters

San Fransisco, CA

Year Founded

2021

Website

weekday.works

Social Media

Scientific AI Evaluation & Computational Problem Designer

Job Description

About Weekday (YC W21)