AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM)

TikTok • San Jose, CA (Onsite) • 2 days ago

Apply

AI can make mistakes so check important info. Chat history is never stored.

Explore job details

Tailor my resume

Practice an interview

Develop new skills

Job Description

About the Team

We are dedicated to building the inference infrastructure for ultra-large-scale language models, vision-language models, and frontier multimodal AI systems. Our mission is to provide a robust, scalable, and high-performance foundation for distributed serving, heterogeneous scheduling, and low-latency inference at massive scale. You will work on some of the most challenging problems in large-model online serving, spanning traffic orchestration, throughput and latency optimization, kernel efficiency, and production reliability for next-generation AI systems.

Responsibilities - What You'II Do

- Build and evolve next-generation inference systems for large-scale online traffic, including global scheduling across heterogeneous compute resources, high-concurrency load balancing, and efficient batch formation

- Optimize distributed inference for 200B+ models and complex multimodal models through TP, EP, DP, and related strategies to improve throughput and latency in production

- Develop high-performance kernels for frontier model architectures such as MoE, emerging attention mechanisms, and multimodal fusion layers using CUDA, Triton, and related tools

- Explore AI-driven infrastructure for inference systems, including AI Agents for kernel optimization, performance tuning, consistency validation, deployment pipelines, and intelligent operations

About TikTok

Inspire Creativity and Bring Joy

Industry

Arts & Entertainment

Company Size

10,000+ employees

Headquarters

Los Angeles, California

Year Founded

Unknown

Website

tiktok.com

Social Media