TikTok

AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM)

TikTok  •  San Jose, CA (Onsite)  •  2 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

About the Team

We are dedicated to building the inference infrastructure for ultra-large-scale language models, vision-language models, and frontier multimodal AI systems. Our mission is to provide a robust, scalable, and high-performance foundation for distributed serving, heterogeneous scheduling, and low-latency inference at massive scale. You will work on some of the most challenging problems in large-model online serving, spanning traffic orchestration, throughput and latency optimization, kernel efficiency, and production reliability for next-generation AI systems.

Responsibilities - What You'II Do

- Build and evolve next-generation inference systems for large-scale online traffic, including global scheduling across heterogeneous compute resources, high-concurrency load balancing, and efficient batch formation

- Optimize distributed inference for 200B+ models and complex multimodal models through TP, EP, DP, and related strategies to improve throughput and latency in production

- Develop high-performance kernels for frontier model architectures such as MoE, emerging attention mechanisms, and multimodal fusion layers using CUDA, Triton, and related tools

- Explore AI-driven infrastructure for inference systems, including AI Agents for kernel optimization, performance tuning, consistency validation, deployment pipelines, and intelligent operations
TikTok

About TikTok

Inspire Creativity and Bring Joy

Industry
Arts & Entertainment
Company Size
10,000+ employees
Headquarters
Los Angeles, California
Year Founded
Unknown
Social Media