Job Description
Team Introduction
Our team is responsible for infrastructure systems of hybrid cloud, including products in IaaS/PaaS/SaaS/AI models. We strive to be a leading Site Reliability Engineering (SRE) team in the industry, driving reliability, scalability, and performance at scale.
As part of the SRE team, you will tackle complex, large-scale challenges, leveraging your expertise in coding, algorithms, complexity analysis, and distributed system design.
We foster a culture of diversity, intellectual curiosity, and open collaboration. Engineers are empowered with strong ownership, autonomy, and the opportunity to work across a wide range of impactful projects. You will also benefit from a supportive environment with mentorship and resources designed to help you continuously learn and grow.
What you will be doing:
1. Responsible for delivery products in hybrid cloud scenarios, including cloud platform planning, software deployment, resource expansion, etc. Collaborate with R&D teams to complete project delivery.
2. Responsible for the operation of cloud platform environments for internal and external customers, including daily alarm handle, on-call support, change, as well as ensuring stability of cloud platform during important event periods.
3. As a SRE we will participate in stability construction of cloud products with R&D team, and continuously improve capabilities in high availability architecture, disaster recovery, alarm monitoring, etc, based on the experience we get from large-scale systems on site.
4. Continuously promote the improvement of hybrid cloud serviceability, participate in the standardized SOW of O&M and delivery for new product versions, and build the SRE serviceability acceptance standards to improve implement efficiency.