Weekday (YC W21)

Scientific AI Evaluation & Computational Problem Designer

Weekday (YC W21)  •  $100/hr  •  United States (Remote)  •  22 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

This role is for one of our clients

Compensation: $45-$100 per hour

We are building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains. This role focuses on designing rigorous, research-grade computational problems that assess how effectively AI systems can leverage real scientific software tools to solve complex challenges.

Unlike traditional annotation roles, this position requires creating original, graduate-level problems rooted in real-world scientific workflows. You will iteratively refine these problems through calibration against state-of-the-art AI models, ensuring the right balance of difficulty, depth, and reasoning complexity.

Requirements

What You’ll Do

  • Design advanced computational problems requiring the use of domain-specific scientific software
  • Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data)
  • Develop problem setups, solution pathways, and validation mechanisms
  • Calibrate and refine tasks based on model performance to achieve target difficulty levels
  • Ensure problems emphasize reasoning strategy over brute-force computation

Domains & Tools of Interest
We are particularly seeking candidates with hands-on experience in:

  • Bioinformatics & Single-Cell Genomics: scanpy, scvelo, squidpy, gudhi (RNA-seq, trajectory inference, spatial transcriptomics)
  • Computational Chemistry: PySCF (HF, DFT, TDDFT, CASSCF, post-HF methods)
  • Particle & Nuclear Physics: scikit-hep, Monte Carlo simulations, collider data analysis
  • Electrical Engineering: scikit-rf, ngspice (RF systems, circuit simulation)
  • Astrophysics & Cosmology: astropy (cosmological modeling, survey analysis)
  • Structural & Mechanical Engineering: scikit-fem (finite element analysis, elasticity, beam theory)
  • Seismology & Geophysics: ObsPy, SPECFEM (waveform analysis, inversion, tomography)
  • Pharmacokinetics & Systems Biology: libRoadRunner, Tellurium, SBML-based tools

Experience with other specialized tools in related domains is also welcomed.

What Makes You a Strong Fit

  • Graduate-level expertise (MS or PhD preferred) in a relevant STEM field
  • Hands-on experience using scientific software libraries for real research problems
  • Strong Python programming skills, including building computational workflows and validators
  • Ability to design challenging problems that require deep reasoning rather than surface-level solutions
  • Familiarity with edge cases, limitations, and practical challenges of scientific tools

Requirements

  • Demonstrated proficiency with at least one relevant scientific library (via research, open-source work, or industry experience)
  • Ability to work independently and iterate based on feedback
  • Comfort working in Linux/terminal environments and remote compute setups
  • Availability of at least 15–20 hours per week

Nice to Have

  • Experience across multiple domains or tools
  • Background in evaluation frameworks or benchmarking
  • Experience in teaching, pedagogy, or problem-set design
  • Familiarity with reproducible research practices and containerized environments

Engagement Details

  • Independent contractor role
  • Fully remote with flexible scheduling
  • Project scope may evolve based on performance and research needs

Compensation & Payments

  • Competitive compensation based on expertise and domain specialization
  • Weekly payments via supported global payment platforms

Additional Information

  • Work must not involve sharing confidential or proprietary information from any current or past employer or institution
  • Projects may be extended, modified, or concluded based on performance and business requirements
  • This opportunity does not currently support certain work authorization categories
Weekday (YC W21)

About Weekday (YC W21)

AI recruiter that runs outbound sourcing campaigns to find top talent. At Weekday, we have built the most accurate database of talent (250mn+ people in US & India with contact data), we run outbound campaigns to identify top talent for any role you might be hiring for. We generate the highest response rates (30-40%) on our campaigns making sourcing talent as easy as making a job posting. We are backed by Y-Combinator and were also ranked #1 on Product Hunt.

Industry
Consulting & Advisory
Company Size
51-200 employees
Headquarters
San Fransisco, CA
Year Founded
2021
Social Media