ByteDance

Research Scientist Graduate (Foundation Model-Speech-Interaction & Learning) - 2026 Start (PhD)

ByteDance  •  $245k - $450k/yr  •  San Jose, CA (Onsite)  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

About the Team
Established in 2023, the ByteDance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society.

With a long-term vision for the AI sector, the Seed team's research spans MLLM, GenMedia, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States. To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios — including Doubao, Jimeng, TRAE, Dola and Dreamnia — and serves enterprise customers through Volcano Engine and BytePlus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption.

The mission of the Seed Speech team is to enrich interactive and creative processes through the application of multimodal speech technologies. The team focuses on the forefront of research and product development in speech and audio, music, natural language understanding, and multimodal deep learning.

We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at ByteDance.

Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume.

Responsibilities
- Contribute cutting-edge research to ByteDance product evolution (e.g., Douyin, Capcut, and more) to impact billions of users worldwide.
- Work on advanced science and technology in audio processing and generation (e.g., Dialogue Systems, Audio-Video Models, Speech Synthesis, Voice Conversion, Audio Codec Learning, Audio Language Modeling, etc.)
- Research, model, design, develop and evaluate novel machine learning models and algorithms.
- Collaborate with globally based researchers and engineering teams in developing machine learning models and algorithms.

The base salary range for this position in the selected city is $244800 - $450000 annually.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media