TikTok

Site Reliability Engineer, Compute - USDS

TikTok  •  Seattle, WA (Onsite)  •  5 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
46
AI Success™

Job Description

Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.

Responsibilities:

- Develop and maintain automation procedures to maximize system efficiency and minimize human intervention.

- Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust.

- Ensure system scalability to handle growth in web traffic and data.

- Implement monitoring tools and set up metrics to keep track of system health and performance.

- Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues.

- Conduct performance tests to find and address system bottlenecks.

- Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).

- Practice sustainable user support, incident response, and blameless postmortems.
TikTok

About TikTok

Inspire Creativity and Bring Joy

Industry
Arts & Entertainment
Company Size
10,000+ employees
Headquarters
Los Angeles, California
Year Founded
Unknown
Social Media