Job Description
This position is with TikTok's Stability Assurance Team. The team is responsible for ensuring that the services provided by TikTok are highly reliable with low-latency. Reliability assurance is complex and systematic for any massive application system and the team focuses on optimizing the application architecture from end to end; driven by data analysis, with automatic and intelligent failure recovery.
Job Responsibilities:
1.Ensure the online stability of TikTok and improve product SLA through systematic disaster recovery abilities, standardized emergency mechanisms, and intelligent analysis.
2.Identify system risks and promote governance through comprehensive and multi-perspective quality data.
3.Establish TikTok's unified standards and specifications, design and develop a one-stop operation platform, and enhance efficiency across multiple fields.
4.Collaborate closely with developers to implement best practices in SRE.