Job Title: Technical Support Manager SRE (Cloud Managed Services)
Education: Any Graduate
Experience: 12+years
Location:Mumbai
We are seeking an experienced SRE Support Manager to lead multi-cloud managed services support operations across Amazon Web Services, Microsoft Azure, and Google Cloud environments. This role will be responsible for ensuring platform reliability, operational excellence, SLA governance, and customer satisfaction while managing Level 1 and Level 2 SRE engineers and collaborating with Level 3 engineering teams.
The ideal candidate combines strong people leadership, customer management, cloud operations expertise, and deep understanding of Site Reliability Engineering practices, including SLI, SLO, SLA, error budgets, observability, automation, and incident management.
Experience Required:
12 + years overall experience with 3+ years in team leadership / support management / SRE management role.
Key Responsibilities:
Team Leadership & Support Operations:
Lead, mentor, and develop Level 1 and Level 2 SRE Support Engineers.
Manage 24x7 support coverage, shift planning, workforce utilization, and operational readiness.
Establish clear escalation matrices and support ownership models.
Drive skill upliftment across cloud technologies, troubleshooting, and SRE practices.
Customer & Service Delivery Management:
Manage support delivery for multiple enterprise managed services customers.
Understand customer expectations, business priorities, and critical workloads.
Act as senior escalation point for high-priority incidents and service concerns.
Ensure proactive communication during outages, incidents, and service requests.
Reliability Engineering & SRE Governance:
Define and monitor Service Level Indicators (SLIs) for availability, latency, error rates, throughput, and ticket responsiveness.
Establish and govern Service Level Objectives (SLOs) aligned to customer needs.
Manage Error Budgets and balance reliability with speed of change.
Improve operational reliability through automation, standardization, and continuous improvement.
Reduce toil and repetitive manual support tasks.
Incident / Problem / Change Management:
Lead major incident management bridges and restoration activities.
Coordinate with Level 3 teams, cloud vendors, and customer stakeholders.
Drive Root Cause Analysis (RCA) and preventive corrective actions.
Ensure controlled execution of change management, patching, releases, and maintenance.
SLA / KPI / Reporting:
Track contractual SLAs, operational KPIs, MTTR, MTTD, ticket aging, and backlog health.
Publish weekly/monthly service review dashboards.
Highlight risks, recurring issues, and improvement opportunities.
Ensure audit readiness and governance compliance.
Multi-Cloud Platform Management:
Oversee customer workloads on:
Amazon Web Services- EC2, RDS, EKS, Lambda, IAM, VPC, CloudWatch
Microsoft Azure- Azure VM, AKS, Azure SQL, VNets, Monitor, Defender
Google Cloud- Compute Engine, GKE, Cloud SQL, IAM, Operations Suite
Required Technical Skills:
Cloud & Infrastructure
Strong hands-on experience in any one or more cloud platforms: Amazon Web Services / Microsoft Azure / Google Cloud
Good understanding of compute, storage, networking, IAM, backup, DR, and security controls.
Experience with Linux and/or Windows server administration.
Knowledge of containers and orchestration platforms such as Kubernetes / Docker.
SRE & Reliability Engineering
Strong knowledge of SRE principles and best practices.
Experience designing and tracking SLI, SLO, SLA frameworks.
Practical understanding of Error Budget policy management.
Expertise in incident response, on-call operations, postmortems, and resilience engineering.
Familiarity with capacity planning, availability engineering, and performance optimization.
Monitoring / Observability
Hands-on experience with:
Amazon CloudWatch
Azure Monitor
Google Cloud Operations Suite
Datadog
Grafana
Prometheus
Automation / DevOps
Experience with scripting: Python / Bash / PowerShell.
Infrastructure as Code using Terraform or similar.
CI/CD exposure using GitHub Actions, Jenkins, or similar tools.
Leadership Skills
Proven experience managing technical support or SRE operations teams.
Strong customer-facing communication skills.
Ability to manage escalations under pressure.
Strong decision-making and stakeholder management skills.
Preferred Qualifications
ITIL Foundation / ITSM knowledge.
AWS / Azure / GCP certifications.
Experience in Managed Services / MSP environment.
Experience leading 24x7 global support teams.
Success Metrics
SLA / SLO attainment
Error budget compliance
MTTR reduction
Service availability improvement
Customer satisfaction (CSAT)
Ticket backlog health
Automation delivered
Team productivity and retention
Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner.

Datavail | Data, Cloud & AI—Built for Real Business Outcomes
Datavail is a data, cloud, and AI consultancy that helps organizations turn complex technology environments into clear, measurable business outcomes.
We partner with data, technology, and IT leaders to make enterprise data more usable, systems more adaptable, and decisions more informed. Our work sits at the intersection of data management, cloud modernization, enterprise applications, and AI—bringing these disciplines together so they support the business, not slow it down.
In a landscape full of tools, platforms, and transformation promises, Datavail focuses on what actually drives progress:
• Trusted, well-managed data that teams can rely on
• Cloud environments without unnecessary cost or complexity
• Enterprise applications that evolve with the business
• Practical, responsible AI that delivers value—not experiments
We help organizations:
• Improve data quality, accessibility, and governance
• Turn analytics and AI into everyday decision-making tools
• Modernize and optimize cloud and application environments
• Reduce operational risk while increasing agility and performance
Our Core Capabilities:
• Data Management & AI: Data foundations, analytics, AI and machine learning that support real-world decisions
• Cloud Services: Cloud modernization, optimization, SRE services, and license optimization
• Enterprise Applications: Managed services, upgrades & integrations, digital transformation, and implementation services
At Datavail, we believe data only creates value when it’s well managed, well understood, and actively used. Our role is to help organizations move from complexity to clarity—and from data to action.