Job Description

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team advancing the state of the art in AI, robotics, driving, and material sciences.

The Learning From Videos (LFV) team develops world foundation models that leverage large-scale multi-modal data (RGB, depth, flow, semantics, actions, tactile, audio, etc.) from multiple domains to power downstream embodied AI tasks. Our topics of interest include Video Generation, World Models, 4D Reconstruction, Multi-Modal Models, Multi-View Geometry, Data Augmentation, and Video-Language-Action models, with a primary focus on embodied applications such as robotics and driving. We are making progress on some of the hardest scientific challenges around spatio-temporal reasoning, and how it can lead to the deployment of autonomous agents in real-world unstructured environments.

Our team is looking for a Research Engineer to help develop and deploy our world foundation models (WFMs) toward their key milestones in the autonomous driving domain. As our WFMs scale in both capability and ambition, we need a strong engineer who can bridge the gap between research ideas and production-grade systems. This is not a traditional software engineering role; you will work directly alongside research scientists, understand the research deeply enough to make independent technical decisions, and play a key role in enabling the deployment of key research breakthroughs into close-to-production environments.

As a Research Engineer, you will be responsible for supporting and optimizing large-scale distributed training of diffusion and transformer models; maintaining the infrastructure that ingests, unifies, and serves heterogeneous multi-modal datasets at scale; and developing tools and pipelines that accelerate the research-to-results cycle. You will work closely with researchers to prototype new ideas, run experiments, and help ship our most successful models toward real-world applications with real-world impact.

This role requires close collaboration with multiple TRI divisions (Robotics, Automated Driving, Human-Interactive Driving, etc.) as well external Toyota and University partners, and the ability to reconcile and prioritize possibly competing requirements in a fast-paced combination of research and production environments.

Responsibilities

Collaborate directly with research scientists to implement, iterate on, and evaluate new architectures, objectives, datasets, and training strategies. Translate research prototypes into clean, maintainable, reusable code that will be shared across multiple TRI teams and the broader Toyota ecosystem.

Build and maintain scalable pipelines for ingesting, converting, validating, and serving heterogeneous datasets (multi-view, multi-modal, multi-embodiment, etc.), across robotics and autonomous driving, into unified training-ready formats. Track and integrate new public and internal datasets as they become available.

Support and optimize large-scale distributed training of world foundation models on multi-GPU and multi-node clusters. Manage experiment workflows, profiling, debugging, and hyperparameter sweeps to ensure optimal performance in a timely manner.

Develop tools for dataset inspection, experiment tracking, model evaluation, GPU resource management, and visualization. Automate repetitive workflows to improve team velocity.

Work with other TRI teams and Toyota affiliates to set up shared pipelines, onboard their data, and support joint training and evaluation efforts.

Produce maintainable, well-documented code. Contribute to internal tooling and open-source releases to the scientific community.

Qualifications

Master’s or PhD in Computer Science, Electrical Engineering, Machine Learning, or a related field, with a minimum of 2 years of relevant experience and strong software engineering skills.

Deep proficiency in Python, PyTorch, and the Unix/Linux toolchain. Comfort working in terminal-heavy, SSH-based workflows on shared GPU clusters.

Hands-on experience with large-scale deep learning training, including distributed training (DDP, FSDP, DeepSpeed, or similar), GPU profiling, and debugging training failures at scale.

Experience building data pipelines for heterogeneous or multi-modal datasets (images, video, depth, point clouds, actions, etc).

Experience with video diffusion models, 3D/4D reconstruction, and multi-view geometry.

You are proactive, self-directed, and comfortable operating with ambiguity in a research-driven environment that spans multiple divisions.

You are a reliable teammate who communicates clearly and takes ownership of problems end-to-end.

Bonus Qualifications

Experience with cloud training infrastructure (AWS SageMaker, EC2) and containerized workflows (Docker, Kubernetes).

Familiarity with standard data formats and collection pipelines (ROS, MCAP, HDF5, etc.) as well as simulation environments.

Proficiency with modern AI-assisted development tools (e.g., Copilot, Cursor, Claude Code) for accelerating engineering workflows.

Track record of contributions to open-source projects or publications at top venues (CVPR, ICLR, NeurIPS, RSS, ICRA, etc.) is a plus but not required.

Please include links to any relevant open-source contributions or technical project write-ups with your application.

The pay range for this position at commencement of employment is expected to be between $180,000 and $258,750/year for California-based roles. Base pay offered will depend on multiple individualized factors, including, but not limited to, a candidate's experience, skills, job-related knowledge, and market location. TRI offers a generous benefits package including medical, dental, and vision insurance, 401(k) eligibility, paid time off benefits (including vacation, sick time, and parental leave), and an annual cash bonus structure. Additional details regarding these benefit plans will be provided if an employee receives an offer of employment.

Please reference this Candidate Privacy Notice to inform you of the categories of personal information that we collect from individuals who inquire about and/or apply to work for Toyota Research Institute, Inc. or its subsidiaries, including Toyota A.I. Ventures GP, L.P., and the purposes for which we use such personal information.

TRI is fueled by a diverse and inclusive community of people with unique backgrounds, education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture. We believe diversity makes us stronger and are proud to provide Equal Employment Opportunity for all, without regard to an applicant’s race, color, creed, gender, gender identity or expression, sexual orientation, national origin, age, physical or mental disability, medical condition, religion, marital status, genetic information, veteran status, or any other status protected under federal, state or local laws.

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records for employment.

About Toyota Research Institute

At Toyota Research Institute (TRI), we’re conducting research to amplify human ability, focusing on making our lives safer and more sustainable. Led by Dr. Gill Pratt, TRI’s team of researchers develops technologies to advance automated driving, energy and materials, human-centered artificial intelligence, human interactive driving, large behavior models, and robotics. We’re dedicated to building a world of “mobility for all” where everyone, regardless of age or ability, can live in harmony with technology to enjoy a better life. Through innovations in AI, we will:

- Develop technology for vehicles and robots to help people enjoy new levels of independence, access, and mobility.

- Bring advanced mobility technology to market faster.

- Discover new materials that will make batteries and hydrogen fuel cells smaller, lighter, less expensive, and more powerful.

Our work is guided by a dedication to safety – in how we research, develop, and validate the performance of vehicle technology to benefit society. As a subsidiary of Toyota, TRI is fueled by a diverse and inclusive community of people who carry invaluable leadership, experience, and ideas from industry-leading companies. Over half of our technical team holds PhD degrees. We’re continually searching for the world’s best talent ‒ people who are ready to define the new world of mobility with us!

We strive to build a company that helps our people thrive, achieve work-life balance, and bring their best selves to work. At TRI, you will have the opportunity to enjoy the best of both worlds ‒ a fun start-up environment with brilliant people who enjoy solving tough problems and the financial backing to successfully achieve our goals. Come work with TRI if you’re interested in transforming mobility through designing safer cars, enabling the elderly to age in place, or designing alternative fuel sources. Start your impossible with us.

Industry

Biotech & Life Sciences

Company Size

201-500 employees

Headquarters

Los Altos, California

Year Founded

2016

Website

tri.global

Social Media

Senior Research Engineer, Computer Vision (LFV/WFM)

Job Description

Responsibilities

Qualifications

Bonus Qualifications

About Toyota Research Institute