Datavail

Senior Specialist - Cloud SRE - Azure, AKS & DevOps

Datavail  •  Mumbai, IN (Hybrid)  •  21 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Job Title: Senior Specialist (SRE) - Azure, AKS & DevOps

Education: Any Graduate

Experience: 8 to 15 years

Location: Mumbai

Key Skills:

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AKS, DevOps, Automation, and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability, performance, security, patch compliance, scalability, and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers, internal engineering teams, and leadership to drive cloud transformation, implement SRE best practices, modernize DevOps delivery pipelines, and improve measurable service outcomes.

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AKS, DevOps, Automation, and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability, performance, security, patch compliance, scalability, and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers, internal engineering teams, and leadership to drive cloud transformation, implement SRE best practices, modernize DevOps delivery pipelines, and improve measurable service outcomes.

Primary Responsibilities

Reliability Engineering & SRE Practices

  • Define and manage SLIs, SLOs, Error Budgets, MTTR, change failure rate, and availability targets.

  • Continuously improve platform reliability, scalability, resilience, and operational maturity.

  • Lead Sev-1 / Sev-2 incident management, escalation handling, and RCA reviews.

  • Conduct blameless postmortems and drive preventive actions.

  • Build operational runbooks, self-healing automation, and on-call processes.

  • Participate in architecture reviews for HA, DR, failover, and performance optimization.

Azure Cloud Operations & Engineering

  • Manage enterprise Azure environments including:

  • Azure Virtual Machines

  • VM Scale Sets

  • Azure App Services

  • Azure Functions

  • Azure SQL / Managed Instance

  • Azure Storage

  • Virtual Networks / NSGs

  • Application Gateway / WAF

  • Azure Front Door

  • Load Balancers

  • Azure Backup & Site Recovery

  • Implement Azure Well-Architected Framework best practices.

  • Drive governance using Management Groups, Policy, RBAC, Key Vault, Defender for Cloud.

  • Optimize cost using Reserved Instances, rightsizing, budgets, and tagging strategy.

AKS & Container Platform Engineering

  • Design, manage, and optimize Microsoft Azure Kubernetes Service (AKS) clusters.

  • Manage cluster upgrades, autoscaling, node pools, ingress controllers, storage classes, and security policies.

  • Support container deployments using Helm, YAML manifests, GitOps workflows.

  • Improve AKS observability using Prometheus, Grafana, Azure Monitor for Containers.

  • Ensure platform reliability for microservices workloads.

DevOps, CI/CD & Automation

  • Build and manage CI/CD pipelines using Azure DevOps, GitHub Actions, Jenkins, or GitLab CI.

  • Implement blue/green, rolling, and canary deployments with rollback strategies.

  • Automate infrastructure using Terraform, ARM Templates, and Bicep.

  • Develop scripts/tools using PowerShell, Bash, Python, Go.

  • Automate patching, backup validation, scaling, compliance checks, and recovery tasks.

  • Reduce manual operational toil through self-service automation.

Patching, Security & Compliance

  • Own enterprise patch management for Windows/Linux workloads using Azure Update Manager.

  • Manage maintenance windows and zero-downtime patch strategies.

  • Implement CIS benchmark, vulnerability remediation, and audit compliance controls.

  • Secure workloads with Key Vault, Private Link, NSGs, Conditional Access, PIM, Defender.

  • Support hybrid environments using Azure Arc-enabled servers.

Observability & Monitoring

  • Build and maintain monitoring platforms using:

  • Azure Monitor

  • Log Analytics

  • Application Insights

  • Grafana

  • Datadog

  • New Relic

  • Prometheus

  • Build executive dashboards, SRE scorecards, SLA reports, capacity trends.

  • Tune alerts to reduce noise and improve actionable detection.

Customer Engagement & Leadership

  • Serve as primary technical contact for enterprise customers.

  • Present monthly service reviews, patch compliance, reliability metrics, and improvement plans.

  • Mentor L1/L2 engineers and guide technical escalations.

  • Collaborate with customer architects, security teams, and developers.

  • Lead cloud modernization and operational excellence initiatives.

Required Qualifications

Experience

  • 8 -10 years in SRE, DevOps, Cloud Engineering, or Production Operations.

  • Minimum 5+ years hands-on with Microsoft Azure production environments.

  • Proven experience managing critical enterprise workloads.

  • Strong customer-facing / managed services background preferred.

Technical Skills

Azure

  • Deep expertise in Azure compute, networking, storage, identity, monitoring, backup, DR.

  • Strong hands-on with AKS, Azure DevOps, Azure Policy, Key Vault.

  • DevOps / Automation

  • Terraform, ARM, Bicep, CI/CD pipelines.

  • PowerShell, Bash, Python scripting.

  • Containers

  • Kubernetes, Docker, AKS operations.

  • Monitoring

  • Azure Monitor, Grafana, Datadog, Prometheus, Log Analytics.

  • Operations

  • Incident management, RCA, patching, performance tuning, DR drills.

Preferred Certifications

  • Microsoft AZ-104

  • AZ-305

  • AZ-400

  • AZ-500

  • Amazon Web Services Associate / Professional

  • CKA / Terraform Associate / SRE Foundation

Nice to Have

  • Multi-cloud (AWS / GCP) experience

  • Chaos engineering

  • FinOps knowledge

  • MSP / Managed Services experience

  • Large-scale enterprise operations

  • Security / Compliance frameworks (ISO 27001, SOC2, HIPAA, PCI)


Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner.

Datavail

About Datavail

Datavail | Data, Cloud & AI—Built for Real Business Outcomes

Datavail is a data, cloud, and AI consultancy that helps organizations turn complex technology environments into clear, measurable business outcomes.

We partner with data, technology, and IT leaders to make enterprise data more usable, systems more adaptable, and decisions more informed. Our work sits at the intersection of data management, cloud modernization, enterprise applications, and AI—bringing these disciplines together so they support the business, not slow it down.

In a landscape full of tools, platforms, and transformation promises, Datavail focuses on what actually drives progress:

• Trusted, well-managed data that teams can rely on

• Cloud environments without unnecessary cost or complexity

• Enterprise applications that evolve with the business

• Practical, responsible AI that delivers value—not experiments

We help organizations:

• Improve data quality, accessibility, and governance

• Turn analytics and AI into everyday decision-making tools

• Modernize and optimize cloud and application environments

• Reduce operational risk while increasing agility and performance

Our Core Capabilities:

• Data Management & AI: Data foundations, analytics, AI and machine learning that support real-world decisions

• Cloud Services: Cloud modernization, optimization, SRE services, and license optimization

• Enterprise Applications: Managed services, upgrades & integrations, digital transformation, and implementation services

At Datavail, we believe data only creates value when it’s well managed, well understood, and actively used. Our role is to help organizations move from complexity to clarity—and from data to action.

Industry
IT & Software
Company Size
501-1,000 employees
Headquarters
Boulder, Colorado
Year Founded
2007
Social Media