ByteDance

Applied Scientist - LLM Training System as a Service - Global Frontier Tech Recruitment Program - 2027 Start (PhD)

ByteDance  •  $213k - $450k/yr  •  San Jose, CA (Onsite)  •  1 month ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.

Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume.

Team Introduction:
AML-MLsys combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI.

Topic Content:
With the evolution from large language models (LLMs) to AI Agents, the training paradigm is undergoing a fundamental shift. Traditional distributed training frameworks like Megatron-LM are designed around relatively static parallelism strategies, whereas Agent training introduces more dynamic patterns, including external tool interactions, multi-step reasoning, and iterative self-improvement.

In this context, tightly coupled system design can limit flexibility and efficiency. To better support these emerging workloads, we aim to build a robust architecture that cleanly separates “logical control” from “compute execution,” enabling more scalable and adaptable training workflows.

Responsibilities:
- Responsible for developing and optimizing LLM training & inference & Reinforcement Learning framework.
- Working closely with model researchers to scale LLM training & Reinforcement Learning to the next level.
- Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM training and inference and RL engine.

The base salary range for this position in the selected city is $212800 - $450000 annually.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media