ByteDance

Senior Backend Engineer - AML Engine Orchestration

ByteDance  •  Singapore, SG (Onsite)  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Team Introduction
The mission of our AML team is to push next-generation machine learning algorithms and platforms for the recommendation system, ads ranking and search ranking in our company. We also drive substantial impact on core businesses of the company.

Responsibilities:
1. Resource Efficiency Optimization in Distributed Orchestration and Scheduling:
- Develop and extend distributed orchestration frameworks within the Kubernetes/Godel ecosystem. Select appropriate frameworks based on different business scenarios, and optimize cluster utilization and load balancing strategies according to the specific characteristics of each scenario;
- Integrate and expand AutoScaling and automatic parallelization capabilities for various models and tasks. Employ load modeling and analytic methods for different models to automatically optimize resource requests, achieving large-scale improvements in resource usage efficiency and global optimality;
- Responsible for preemption and re-scheduling mechanisms for services with different prioritties, and manage automatic resource multiplexing across different clusters and resource types; handle scheduling and load adaptation across multi-datacenter, multi-region, and multi-cloud environments.
2. Building Training System Architecture for Next-Generation Ultra-Large and Ultra-Deep Recommendation Models:
- Develop a flexible, elastic and robust distributed training runtime focused on hyper-scaled embeddings and large-scale GPU training;
- Design and optimize distributed computing APIs and runtimes geared towards future recommendation and ads model paradigms (e.g., reinforcement learning, fine-tuning and/or distillation);
- Collaborate with platform teams to enhance the diagnosability and usability of distributed training systems.
3. Constructing Online Orchestration Architecture for Next-Generation Recommendation Systems:
- Build a robust distributed model inference architecture for online learning scenarios involving hyper-scaled embeddings;
- Optimize the usability of online recommendation and ads model architectures and MLops workflows.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media