NTT DATA

Site Reliability Engineer

NTT DATA  •  Singapore, SG (Onsite)  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Role Site Reliability Engineer - 12 months Renewable contract

Experience: Minimum of 5 years

Location : Changi Business Park

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing Observability team. The ideal candidate will have a strong background in building and maintaining robust observability environments, including monitoring, logging, and tracing systems. This role will focus on the design, implementation, and support of our observability infrastructure, ensuring the seamless onboarding of applications and providing critical support during incidents.

Responsibilities:

  • Observability Environment Management: Design, build, and maintain our observability infrastructure, including monitoring tools, logging platforms, and distributed tracing systems (e.g., Prometheus, Grafana, Elasticsearch, etc.). This includes capacity planning, performance tuning, and ensuring high availability.
  • Application Onboarding: Work with development teams to onboard applications to our observability platform, providing guidance on instrumentation best practices and ensuring data quality. This includes creating and maintaining documentation and training materials.
  • Incident Support: Provide timely and effective support during incidents, leveraging observability data to diagnose and resolve issues quickly. This includes contributing to post-incident reviews and implementing preventative measures.
  • Automation: Automate repetitive tasks and processes related to observability, improving efficiency and reducing manual effort. This may involve scripting, developing tools, or integrating with CI/CD pipelines.
  • Alerting and Monitoring: Develop and maintain effective alerting strategies, ensuring appropriate escalation procedures and minimizing noise. This includes creating dashboards and reports to visualize system health and performance.

Qualifications:

  • Bachelors degree in computer science or a related field, or equivalent experience.
  • 5+ years of experience as an SRE or in a similar role with a focus on observability.
  • Strong understanding of distributed systems and microservices architectures.
  • Experience with any monitoring, logging, and tracing tools (e.g., Prometheus, Grafana, Jaeger, Elasticsearch, Fluentd, Datadog, Dynatrace, etc.).
  • Proficiency in scripting languages such as Python, Go, or Bash.
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration skills.

Bonus Points:

  • Experience with cloud platforms.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible)
NTT DATA

About NTT DATA

NTT DATA – a part of NTT Group – IT and business services headquartered in Tokyo. We help clients transform through consulting, industry solutions, business process services, digital & IT modernization and managed services. NTT DATA enables them, as well as society, to move confidently into the digital future. We are committed to our clients’ long-term success and combine global reach with local client attention to serve them in over 50 countries around the globe.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Tokyo, JP
Year Founded
Unknown
Social Media