Job Description
Who we are
Our client operates in a highly technology-driven environment, where digital solutions play a key role in shaping internal processes and external interactions. As part of an ongoing transformation journey, there is a strong focus on enhancing engineering capabilities, adopting agile delivery models, and modernizing the existing IT landscape through strategic, long-term investments, including cloud technologies.
As Site Reliability Engineer you will contribute to the overarching implementation and operation of our client's Online Banking platform inthe Google Cloud to become a central part of the feature-squads, based on the paradigm "you built it you run it".
Location: Bucharest
What You’ll Be Doing
- Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service
- Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
- Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
- Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
- Hands-on experience in implementing reliability and resilience patterns like auto-scaling, curcuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
- Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
- Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
- Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
- Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
- Lead people of your engineer profession and make them become better each day
What We’re Looking For
- Bachelor’s degree in Computer Science, Engineering, or related field
- Minimum 5 years proven work experience as a Reliability Engineer or similar role
- Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as withDocker / Kubernetes in combination with with Google Kubernetes, Engine (GKE), Terraform or similar technology
- Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
- Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud, Operations, Lightstep and Ansible
- Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
- Proactive attitude and collaborative Team player mindset paired with self confidence
- Not loosing your coolness and keep your eye for details even in stressful situations where time matters
- Having a creative approach towards solving technical problems
- Excellent communication skills in English