
Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.
We’re creating AI that understands subtle human signals—a raised eyebrow, a micro-expression, a shift in posture—and responds naturally in context. This work sits at the frontier of multimodal modeling, real-time systems, and generative video.
We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with deep experience at Apple and Meta, shipping low-latency ML products used by millions.
High-quality video generation has progressed rapidly—but real-time, expressive, controllable video diffusion for interactive systems remains unsolved Most approaches are offline, slow, or brittle when pushed into real-world constraints.
This role exists to define and own video diffusion research inside a broader human foundation model. As a Founding Research Scientist, you’ll shape how motion, expression, and visual coherence are learned, generated, and integrated into real-time multimodal systems.
This is a true blank-page role. You’ll decide which research paths are worth pursuing, how models are trained and evaluated, and how breakthroughs turn into systems that actually work.
You’ll help build the first human foundation model operating across text, speech, facial expression, and body language in real time.
Your work will enable systems that:
Understand fine-grained visual signals such as facial expression, gesture, and posture
Generate lifelike, temporally coherent video with expressive motion and identity consistency
Power real-time avatars whose expressions and movements evolve frame-by-frame during interaction
The field is wide open. While current systems excel at static visuals or offline generation, real-time, multimodal video generation remains a foundational challenge—and this role is about defining that future.
You’ll operate as a founding-level researcher with end-to-end ownership over video diffusion research and its path to production.
You will:
Design and train state-of-the-art video diffusion and generative vision models
Own the full ML pipeline, from dataset construction and preprocessing to large-scale training and evaluation
Explore architectures and objectives for temporal coherence, controllability, and identity preservation
Push research breakthroughs into practical, low-latency systems
Develop benchmarks and evaluation strategies for expressive, real-time video generation
Write clean, maintainable research code that supports rapid iteration
Collaborate closely with researchers across speech, language, and multimodal learning
You enjoy frontier research, but you care deeply about real-world constraints. You’re comfortable navigating ambiguity, setting your own technical direction, and collaborating across domains.
You likely:
Love blank-page research problems and defining new problem spaces
Move quickly from ideas to experiments to insights
Think deeply about model behavior, failure modes, and evaluation
Thrive in small, highly technical, in-person teams
PhD or equivalent experience in video diffusion, generative vision, or closely related fields
Strong background in training large-scale generative models for images or video
Deep expertise in modern deep learning and large-scale training systems
Experience running the full ML lifecycle, from data curation through evaluation
Ability to translate research ideas into practical systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills
Publications at top ML, vision, or generative modeling conferences
Experience with real-time or low-latency generative systems
Prior work on avatars, facial animation, or human motion modeling
Experience shipping ML systems used by real users
Joining Nuance Labs now means defining the visual foundation of a category-defining AI system. You’ll have outsized influence over core research directions, work closely in person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

At Rethink Recruit, we bring a mix of old-school work ethic with a modern approach to recruitment. Our competitive edge comes from our niche focus on Autonomous Driving, EVs, AI, Robotics, Blockchain (FinTech), and the ability to adapt to new and emerging technologies.
For our clients - it allows them to tap into our talent pool of tens of thousands of pre-qualified, industry and skill-specific candidates with whom we have developed close working relationships over the past decade.
For our candidates - it allows them to spend less time on their job search by engaging with more industry-specific companies and more skill-specific jobs.
We believe that any agency can find marginal success by deploying a host of modern technologies and recruitment tools for outreach ($$$). Yet, what separates the good agencies from the great is how relevant their outreach is and how well they utilize their tools. We help bridge that gap by harnessing the power of modern technology to build long-standing relationships with thousands of diverse and incredibly talented people.
We care about what we do and the people we work with. Coupled with our high-level understanding of the technology and its applications, we stay at the forefront of the current market trends. So whether you are a candidate seeking a new role or a company looking to retain talent, please reach out to us, and we will look forward to working with you!