TikTok

Site Reliability Engineer, Global E-commerce

TikTok  •  San Jose, CA (Onsite)  •  3 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

The Global E-commerce Service Architecture team ensures the availability, scalability, and resilience of TikTok’s e-commerce platform in the U.S., partnering closely with product and engineering teams to operate reliable, large-scale production systems.

We are seeking a Site Reliability Engineer (SRE) to advance the stability and resilience of TikTok Global E-commerce services in the U.S. In this role, you will strengthen disaster recovery readiness, optimize infrastructure capacity, and elevate service stability.

Key Responsibilities:

- Data Center Disaster Recovery: Ensure services maintain disaster recovery capabilities under normal operations, including contingency planning and drills, capacity assurance, and effective response in disaster scenarios.

- Resource Management & Capacity Planning: Manage and plan server and compute resources, including resource restructuring, overall capacity planning, and dynamic scaling, to support reliable business deployment and operations.

- Service Stability Improvement: Establish and enhance service monitoring systems to enable timely alerting on failures and rapid issue identification and resolution. Partner with Business stakeholders to conduct ongoing stability governance.
TikTok

About TikTok

Inspire Creativity and Bring Joy

Industry
Arts & Entertainment
Company Size
10,000+ employees
Headquarters
Los Angeles, California
Year Founded
Unknown
Social Media