Job Description
Site Reliability Engineer, Government
- Title of Role: Site Reliability Engineer, Government
- Location: On-site, hybrid
- Company Stage of Funding: Secondary Market — Software Development
- Office Type: Hybrid
- Salary: [To be confirmed with final candidates]
We're representing a dynamic organization that develops advanced data integration and analysis software tailored for government, defense, intelligence, finance, and healthcare sectors. Their platforms empower analysts to navigate complex and sensitive data sets, fostering collaboration across various agencies and transforming fragmented information into actionable insights. This mission-driven environment ensures that engineering efforts directly contribute to national security and critical government operations.
What You Will Do
- Design, build, and maintain reliable, scalable software platforms within US Government environments.
- Collaborate with software engineering teams to establish and uphold service level objectives (SLOs) and reliability standards.
- Lead incident management initiatives, including on-call rotations and post-incident reviews.
- Automate operational processes using scripting and infrastructure-as-code to enhance system resilience.
- Optimize deployment pipelines and manage containerized workloads within secure government networks.
- Utilize observability tools to monitor system health and proactively resolve performance issues.
- Work with government stakeholders to ensure compliance with security and accreditation requirements.
Ideal Candidate Background
- Proven experience in Site Reliability Engineering, DevOps, or Platform Engineering, ideally with large-scale systems.
- Familiarity with US Government, defense, or intelligence community environments, especially regarding secure networks.
- Proficient in at least one scripting language (e.g., Python, Go, Bash, Java) for automation and tooling.
- Experience with Kubernetes for container orchestration and Docker for containerization.
- Knowledge of infrastructure-as-code tools (e.g., Terraform, Ansible, Helm) and CI/CD management.
- Understanding of distributed systems principles, including fault tolerance and microservices reliability.
Preferred
- Active or current US Government security clearance is highly desirable.
- Experience with observability platforms and developing monitoring and alerting solutions.
Compensation and Benefits
This role offers competitive compensation and a comprehensive benefits package, including opportunities for professional growth and development in a mission-driven environment.