Job Description
About the Team
The team builds and operates large-scale, massively distributed infrastructures, applying Site Reliability Engineering (SRE) principles of software and systems engineering to ensure our traffic services are reliable, fault-tolerant, efficiently scalable, and cost-effective. You will have the opportunity to manage a variety of complex systems at scale, including traffic systems serving hyperscale datacenters and public cloud environments, and a global load balancer that handles Tbps of traffic.
We build and operate multi-cloud-based, large-scale network services around the world to accelerate and optimize network traffic for TikTok and a variety of application services for ByteDance internal customers. These services include, but are not limited to, Layer 4 load balancing, Layer 4/7 acceleration, global ingress, CMAF, FaaS, and WAF. By joining us, you can work within a brilliant team and learn how to build a TikTok-scale network traffic platform serving billions of users globally.
Responsibilities
- Build, expand and operate ByteDance’s global traffic platform, including large-scale systems in public and private clouds, edge data centers.
- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global traffic platform.
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
- Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement.