Job Description
Technical Skills
· 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
· Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
· Strong knowledge of Kubernetes and container orchestration best practices.
· Experience managing services on Amazon ECS (Fargate or EC2).
· Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
· Skilled in scripting languages such as Python, Bash, or Go.
· Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
· Deep understanding of microservices architectures, API gateways, and service meshes.
Soft Skills
· Proven leadership and cross-functional collaboration skills.
· Strong problem-solving and incident-resolution mindset.
· Clear communication, documentation, and stakeholder reporting abilities.
· Passion for continuous improvement and automation.
Preferred Qualifications
· AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
· Familiarity with service meshes like Istio or Linkerd.
· Experience with serverless architectures and event-driven systems.
· Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python
SRE- AWS
We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, Kubernetes, ECS, and the ability to build scalable, secure, and highly available infrastructure from scratch. The ideal candidate will be a strong advocate for DevOps principles, automation, and reliability, and will possess the skills to support and optimize complex microservices-based architectures.
Key Responsibilities
•
Infrastructure Design & Implementation
•
Design and build highly scalable, fault-tolerant, and secure cloud infrastructure using AWS, Kubernetes, and ECS.
•
Lead efforts in infrastructure as code (IaC) using tools like Terraform or CloudFormation.
•
Develop and enforce best practices for infrastructure provisioning, security, and cost optimization.
System Reliability & Performance
•
Ensure availability, performance, scalability, and security of production systems.
•
Implement observability strategies including monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, or Datadog.
•
Analyse system performance metrics and proactively identify potential issues and bottlenecks.
DevOps & Automation
•
Build and maintain CI/CD pipelines to streamline code deployments across environments.
•
Drive automation in infrastructure provisioning, configuration management, and operational tasks.
•
Ensure repeatable and reliable deployments using containers and orchestration tools like Kubernetes and ECS.
Service Management
•
Own the SRE lifecycle, including incident management, postmortems, root cause analysis, and runbook creation.
•
Collaborate closely with development and QA teams to ensure seamless microservices integration, deployment, and lifecycle management.
•
Maintain service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.
Security & Compliance
•
Implement and enforce cloud security best practices for networking, identity and access management, and data protection.
•
Support audits, compliance assessments, and vulnerability remediation.
•
Monitor for security anomalies and work with security teams to respond to threats.
Technical Skills
•
6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
•
Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
•
Strong knowledge of Kubernetes and container orchestration best practices.
•
Experience managing services on Amazon ECS (Fargate or EC2).
•
Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
•
Skilled in scripting languages such as Python, Bash, or Go.
•
Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
•
Deep understanding of microservices architectures, API gateways, and service meshes.
Soft Skills
•
Proven leadership and cross-functional collaboration skills.
•
Strong problem-solving and incident-resolution mindset.
•
Clear communication, documentation, and stakeholder reporting abilities.
•
Passion for continuous improvement and automation.
Preferred Qualifications
•
AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
•
Familiarity with service meshes like Istio or Linkerd.
•
Experience with serverless architectures and event-driven systems.
•
Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python
Key Responsibilities:
Cloud Platform: GCP
•
Infrastructure Automation: Design, implement, and manage infrastructure as code using Terraform to provision and manage GCP resources.
•
Container Orchestration: Deploy and manage Kubernetes clusters, ensuring efficient operation of containerized applications.
•
Continuous Integration/Continuous Deployment (CI/CD): Develop and maintain CI/CD pipelines using Jenkins to automate application build, test, and deployment processes.
•
Containerization: Collaborate with development teams to containerize applications using Docker and manage deployments with Helm Charts.
•
Code Quality Assurance: Integrate and manage SonarQube to ensure code quality and security standards are met.
•
Monitoring and Logging: Implement and manage monitoring solutions using Datadog to ensure system health, performance, and security.
•
Collaboration: Work closely with cross-functional teams, including developers, QA, and operations, to streamline processes and improve productivity.
Requirements:
•
Experience: 5+ years in DevOps or cloud engineering roles, with at least 3 years of relevant experience in the specified technologies.
•
Technical Proficiency:
o
Hands-on experience with GCP services and architecture.
o
Proficiency in Terraform for infrastructure as code implementations.
o
Strong understanding and experience with Kubernetes and Docker.
o
Experience in setting up and managing CI/CD pipelines using Jenkins.
o
Familiarity with Helm Charts for application deployment.
o
Experience with SonarQube for code quality analysis.
o
Proficiency in monitoring and logging tools, particularly Datadog.
•
Scripting Skills: Proficiency in scripting languages such as Bash or Python is an added advantage.
o
Strong problem-solving abilities and analytical thinking.
o
Excellent communication skills, both verbal and written.
o
Ability to work collaboratively in a team environment.
o
Strong organizational and time management skills.
Skills – Terraform, Kubernetes, Cluster, Docker, GCP, SonarQube