
Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.
We’re working toward AI that understands subtle human signals—a shift in tone, a hesitant pause, a quirked eyebrow—and responds in a way that feels genuinely human. This is foundational work at the intersection of speech, multimodal learning, and real-time systems.
We’re backed by a $10M seed round from Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.
Speech is at the core of human interaction—and it’s the backbone of truly human AI. While today’s voice systems have made progress on prosody and naturalness, real-time, emotionally grounded, multimodal speech generation remains unsolved.
This role exists to own and push the frontier of speech synthesis inside a broader human foundation model. As a Founding Research Scientist, you’ll help define how speech models are trained, evaluated, and integrated into a real-time system that unifies voice, language, and expression.
This is a blank-page role with real agency. You’ll help decide what problems matter, how we approach them, and how research turns into systems that actually work in the world.
You’ll help create the first human foundation model that operates across text, speech, facial expression, and body language in real time.
Your work will contribute to systems that:
Understand fine-grained human signals, from vocal nuance to subtle changes in expression
Generate lifelike, responsive speech that adapts frame-by-frame to context and emotion
Power real-time avatars whose voice, tone, and expression evolve naturally in interaction
This is a rare opportunity to shape foundational technology in a space where the boundaries are still being defined.
You’ll operate as a founding-level researcher with end-to-end ownership over speech synthesis research and its path to production.
You will:
Design, train, and evaluate state-of-the-art speech synthesis and audio generation models
Own the full ML pipeline, from data wrangling and rapid prototyping to large-scale training and benchmarking
Push research breakthroughs into practical, real-time systems
Explore new architectures and training strategies for expressive, low-latency speech generation
Write clean, maintainable research code that supports fast iteration
Collaborate closely with researchers across vision, language, and multimodal modeling
You’re someone who loves frontier research—but you also care deeply about whether things actually work. You’re comfortable with ambiguity, motivated by unsolved problems, and excited to chart your own course.
You likely:
Enjoy blank-page research problems and setting your own technical direction
Move quickly from ideas to experiments to results
Care about both model quality and real-world constraints like latency and stability
Thrive alongside other highly driven, deeply technical collaborators
PhD or equivalent experience in speech synthesis, audio generation, or closely related fields
Deep expertise in training speech or audio models (e.g., TTS, speech-to-speech, neural vocoders)
Strong command of modern deep learning methods and large-scale training workflows
Experience running the full ML lifecycle, from dataset construction through evaluation
Ability to translate research insights into working systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills
Publications at top ML, speech, or audio conferences
Experience with real-time or low-latency ML systems
Prior work on multimodal models involving speech, vision, or language
Experience shipping ML systems used by real users
Joining Nuance Labs now means shaping the core research direction of a company tackling one of the hardest problems in AI: real-time, emotionally intelligent human interaction.
You’ll have outsized ownership, direct influence on foundational systems, and the chance to work in-person with a world-class team that blends frontier research with product-grade engineering. If you want your research to define a new category—not just incrementally improve an existing one—this role offers that opportunity.

At Rethink Recruit, we bring a mix of old-school work ethic with a modern approach to recruitment. Our competitive edge comes from our niche focus on Autonomous Driving, EVs, AI, Robotics, Blockchain (FinTech), and the ability to adapt to new and emerging technologies.
For our clients - it allows them to tap into our talent pool of tens of thousands of pre-qualified, industry and skill-specific candidates with whom we have developed close working relationships over the past decade.
For our candidates - it allows them to spend less time on their job search by engaging with more industry-specific companies and more skill-specific jobs.
We believe that any agency can find marginal success by deploying a host of modern technologies and recruitment tools for outreach ($$$). Yet, what separates the good agencies from the great is how relevant their outreach is and how well they utilize their tools. We help bridge that gap by harnessing the power of modern technology to build long-standing relationships with thousands of diverse and incredibly talented people.
We care about what we do and the people we work with. Coupled with our high-level understanding of the technology and its applications, we stay at the forefront of the current market trends. So whether you are a candidate seeking a new role or a company looking to retain talent, please reach out to us, and we will look forward to working with you!