Icertis

Lead Software Engineer, Cloud Site Reliability (SRE)

Icertis  •  Pune, IN (Onsite)  •  20 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Role Responsibilities:

  • Lead 24x7 NOC operations with mandatory rotational shifts ensuring system availability and SLA adherence

  • Act as Major Incident Manager (P1/P2 incidents), driving triage, war room coordination, and stakeholder communication

  • Implement and enhance observability practices across logs, metrics, and traces

  • Work with tools like Datadog and Azure Monitor for monitoring and alerting

  • Drive proactive monitoring, alert tuning, anomaly detection, and AIOps initiatives

  • Manage Azure infrastructure and AKS clusters, including troubleshooting, scaling, and performance tuning

  • Build automation and self-healing workflows using Terraform, ARM, Helm, Power Automate, and scripting

  • Collaborate with engineering teams to improve reliability, deployment pipelines, and cloud-native architecture

  • Develop dashboards and reports using Power BI and ServiceNow

  • Handle Monthly Business reviews and leadership reporting

  • Mentor team members and drive process standardization and operational excellence

Required Skills:

  • 7–12 years of experience in CloudOps SRE / NOC environments (24x7 operations)

  • Strong expertise in Azure Infrastructure (VMs, Networking, Storage)

  • Hands-on experience with Azure Kubernetes Service (AKS), Kubernetes, Docker

  • Strong experience with monitoring and observability tools (Datadog, Azure Monitor, Prometheus, Grafana)

  • Proven experience in Incident Management / Major Incident Handling, Monthly reporting

  • Experience with Infrastructure as Code (Terraform, ARM templates, Helm)

  • Scripting skills in PowerShell, Python, or Bash

  • Experience with ServiceNow (Incident, Problem, Change modules and dashboards)

  • Strong reporting and analytics experience using Power BI and exposure to tools like Power Automate

  • Good understanding of distributed systems and cloud-native architecture

  • Excellent communication, leadership, and problem-solving skills

Preferred Skills:

  • Experience in multi-cloud environments (AWS/GCP)

  • Exposure to AIOps / predictive monitoring / self-healing systems

  • Azure / Kubernetes certifications

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

Icertis

About Icertis

Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, 30% of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.

Industry
IT & Software
Company Size
1,001-5,000 employees
Headquarters
Bellevue, WA
Year Founded
2009
Social Media