Job Description
Job Title: Senior Site Reliability Engineer (SRE)
Location: Hybrid – San José, Costa Rica (2–3 days in office)
Type of Contract: Full-Time (EOR transitioning to Direct Employment)
Salary Range: Market Rates
Language Requirements: Advanced English (Required)
We are seeking a skilled Senior Site Reliability Engineer with strong hands-on experience in infrastructure automation, Kubernetes, and CI/CD pipelines to join our growing team. You will play a key role in building, securing, and optimizing scalable infrastructure and deployment systems across hybrid environments. Your work will directly impact system reliability, deployment efficiency, and the overall performance of mission-critical platforms.
Key Responsibilities
Design and implement infrastructure automation to enable consistent, repeatable deployments across on-premises and customer-managed environments
Develop and maintain CI/CD pipelines using tools such as GitHub Actions and ArgoCD to improve deployment speed and reliability
Manage and optimize Kubernetes clusters, including application packaging and deployment using tools like Grafana Tanka and Kustomize
Build and maintain observability systems (monitoring, logging, alerting) using the Grafana stack
Troubleshoot and resolve performance and reliability issues, including scaling, latency, and resource allocation challenges
Implement security best practices including container security, vulnerability scanning, and network hardening
Collaborate with engineering teams to support infrastructure needs, troubleshoot environments, and improve developer experience
Must-Have Qualifications
3–5 years of hands-on experience in Site Reliability Engineering, DevOps, or infrastructure engineering (practical experience required; not purely theoretical)
Strong background in the software engineering lifecycle with an engineering-first mindset
Proven hands-on experience with Kubernetes in production environments (deployment, operations, troubleshooting)
Experience building and maintaining CI/CD pipelines (GitHub Actions, ArgoCD, or similar tools)
Solid understanding of infrastructure-as-code and configuration management practices
Experience with observability and monitoring tools, preferably Grafana stack
Strong problem-solving skills with the ability to clearly explain technical processes and decisions with real-world examples
Preferred Qualifications
Experience working in hybrid or on-premises infrastructure environments (non cloud-native focused)
Familiarity with VMware-based environments
Experience with DevSecOps practices and security-focused infrastructure design
Exposure to customer-managed or air-gapped deployment environments
Prior experience mentoring junior engineers or supporting cross-functional teams