Job Description

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re working toward AI that can read subtle human signals—a shift in tone, a glance, a pause—and respond in a way that feels natural and grounded in context. This is foundational work at the frontier of multimodal learning and real-time systems.

We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.

Why This Role Exists

Large multimodal models are advancing quickly, but real-time, human-centered interaction remains unsolved Training models that can reason across text, speech, vision, and embodied signals—while operating under tight latency constraints—requires new approaches to architecture, data, and optimization.

This role exists to own and define how multimodal large language models are trained inside a broader human foundation model. As a Founding Research Scientist, you’ll set technical direction, design training strategies, and turn research ideas into systems that can operate in the real world.

This is a blank-page role with real agency. You’ll decide what problems matter, how we tackle them, and how research translates into working, scalable models.

What You’ll Be Building

You’ll help build the first human foundation model that operates across text, speech, facial expression, and body language in real time.

Your work will power systems that:

Understand fine-grained human signals across modalities and infer meaning in context
Reason autoregressively over multimodal inputs in real time
Drive lifelike avatars whose expressions, gestures, and tone evolve frame-by-frame during interaction

The field is wide open. Existing solutions treat language, voice, and vision as separate problems. This role offers the rare chance to define how these modalities are trained and unified at the foundation-model level.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over MLLM training and evaluation.

You will:

Design and train multimodal large language models and autoregressive architectures
Own the full ML pipeline, from dataset design and preprocessing to large-scale training and benchmarking
Develop training strategies that balance quality, generalization, and real-time performance
Push research breakthroughs into practical, production-oriented systems
Explore new architectures, objectives, and scaling strategies for multimodal reasoning
Write clean, maintainable research code that enables rapid iteration
Collaborate closely with researchers across speech, vision, and systems engineering

Who Will Thrive Here

You’re comfortable operating at the research frontier and making progress without a playbook. You care deeply about model behavior, but you’re equally motivated by getting things to work outside the lab.

You likely:

Enjoy blank-page research problems and defining technical direction
Move quickly from ideas to experiments to results
Think deeply about data, evaluation, and failure modes
Thrive in highly collaborative, cross-domain teams

Requirements

PhD or equivalent experience in multimodal LLMs, MLLM training, or closely related fields
Deep expertise in training large-scale autoregressive models
Strong command of modern deep learning and distributed training systems
Experience running the full ML lifecycle, from data curation to evaluation
Ability to translate research insights into practical systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills

Nice to Have

Publications at top ML or multimodal AI conferences
Experience with real-time or low-latency ML systems
Prior work unifying language, vision, and/or speech models
Experience shipping large ML systems into production

Why Join Now

Joining Nuance Labs now means defining the training foundation of a category-defining AI system. You’ll have outsized influence over core research decisions, work in-person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

About Rethink Recruit

At Rethink Recruit, we bring a mix of old-school work ethic with a modern approach to recruitment. Our competitive edge comes from our niche focus on Autonomous Driving, EVs, AI, Robotics, Blockchain (FinTech), and the ability to adapt to new and emerging technologies.

For our clients - it allows them to tap into our talent pool of tens of thousands of pre-qualified, industry and skill-specific candidates with whom we have developed close working relationships over the past decade.

For our candidates - it allows them to spend less time on their job search by engaging with more industry-specific companies and more skill-specific jobs.

We believe that any agency can find marginal success by deploying a host of modern technologies and recruitment tools for outreach ($$$). Yet, what separates the good agencies from the great is how relevant their outreach is and how well they utilize their tools. We help bridge that gap by harnessing the power of modern technology to build long-standing relationships with thousands of diverse and incredibly talented people.

We care about what we do and the people we work with. Coupled with our high-level understanding of the technology and its applications, we stay at the forefront of the current market trends. So whether you are a candidate seeking a new role or a company looking to retain talent, please reach out to us, and we will look forward to working with you!

Industry

HR & Recruiting

Company Size

1-10 employees

Headquarters

Los Angeles, CA

Year Founded

2020

Website

rethinkrecruit.io

Social Media

Founding Research Scientist - MLLM Training