Job Description
We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.
Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume.
Team Introduction: Our Arch-Data Ecosystem team plays a crucial role in the data ecosystem of the TikTok Recommendation System, focusing on creating offline and real-time data storage solutions for large-scale recommendation, search, and advertising businesses, serving over 1 billion users. The core goals of the team are to ensure high system reliability, uninterrupted service, and smooth data processing. We are committed to building a storage and computing infrastructure that can adapt to various data sources and meet diverse storage requirements, ultimately providing efficient, cost-effective, and user-friendly data storage and management tools for the business.
Topic Content: Building a unified infrastructure that integrates the "training data base" and "training/inference state system" for multimodal foundation models in search, recommendation, and advertising scenarios. Through collaborative optimization of data lakes, caching, distributed computing, and GPU IO, we aim to reduce training and inference costs for foundation models while improving iteration efficiency.
Responsibilities:
- Design and implement real-time and offline data architecture for large-scale recommendation systems.
- Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference.
- Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features.
- Own core components of our distributed storage and processing stack, from file format to stream compaction to metadata management.