Senior Site Reliability Engineer (Player‑Coach)
Location: Hybrid (U.S.) Austin, TX or Cranberry Woods, PA
Department: Global Cloud Operations
Reports to: Vice President, Global Cloud Operations Visa Sponsorship: Any form of Visa Sponsorship is not offered for this position. Must be US citizen or Permanent Resident.
Why Join Omnicell?
At Omnicell, reliability isn’t abstract—it directly impacts how medications are dispensed and how patient care is delivered. As Omnicell evolves from on‑premise, hardware‑centric products to a cloud‑native, SaaS‑delivered platform that hospitals depend on 24/7, we are building a Global Cloud Operations organization from the ground up We are hiring our first Senior Site Reliability Engineer to define what “good” looks like for reliability at Omnicell. This is a rare opportunity to architect an SRE practice end‑to‑end—setting standards, making foundational technology decisions, and operating hands‑on in production—while partnering directly with senior leadership. If you’re energized by building, owning outcomes, and working in a regulated healthcare environment where reliability truly matters, this role was created for you.
About This Opportunity:
Omnicell is building a Global Cloud Operations organization from the ground up as our business shifts from on-premise, hardware-centric products to a cloud-native, SaaS-delivered platform that hospitals depend on 24/7. The Site Reliability Engineering function is the reliability engine of that organization, and this role is the first senior SRE hire — the person who will design the practice, set the standards, and then run the plays themselves until the team is large enough to delegate.
This is not a role where reliability practices already exist and you tune them. It is a role where you define what good looks like for Omnicell: which services have SLOs and at what targets, how incidents are declared and commanded, what the on-call rotation feels like, which observability platform we standardize on, and how reliability investment is prioritized against feature velocity. You will make those calls in partnership with the VP of Global Cloud Operations and an Engineer III SRE you will coach and grow.
The environment is hybrid. Some of our products are still hardware in hospitals communicating with cloud services; others are fully SaaS. Some customers access us over private circuits, others over the public internet. We operate in a regulated environment — HIPAA, SOC 2, and in some engagements FedRAMP — which means reliability, security, and auditability are not separable concerns. The person we hire will be comfortable with that complexity and will help the organization design for it rather than around it.
This role also anchors Omnicell's forward investment in AI-driven operations. Over the course of the first year, the organization intends to incorporate AIOps and ML-assisted observability — anomaly detection, intelligent alert correlation, LLM-assisted runbook generation — into how we monitor and respond to our platform. You will be the technical owner of how that gets introduced, prioritized against foundational reliability work, and validated in a regulated environment.
What You’ll Do
Purpose: Establish and operate Omnicell’s Site Reliability Engineering function, balancing hands‑on engineering with practice design, coaching, and cross‑functional leadership.
Primary Impact
You will ensure Omnicell’s Tier‑1 cloud services are observable, resilient, and dependable—so hospitals, pharmacies, and clinicians can rely on our platform without interruption.
Reliability Practice & Operating Model
Partner with the VP to migrate the interim incident response RACI — currently held by matrixed individuals across IT, Engineering, Support, and Enterprise Security — into a durable SRE-owned model.
Select and stand up the primary observability platform, preferring extension of existing Omnicell contracts (DataDog, IBM/Instana, Prometheus/Grafana, OpenTelemetry, or other tooling already in use) over net-new procurement. Define the instrumentation standards all new services must meet.
Hands‑On Engineering & Incident Leadership
Contribute production code and infrastructure‑as‑code (Terraform preferred) to the platform. Oversee the design and evolution of the CI/CD pipelines - current stack is Codefresh, Teamcity, Github Actions, and Octopus Deploy, and we are consolidating over time
Administer and scale our Kubernetes platform, including secure and compliant cluster configurations. Working knowledge of Docker, Helm, and Service Mesh (Istio or Linkerd) expected.
AI‑Driven Operations
Coaching & Team Building
What Success looks like in the first six months
Concrete outcomes this role will be evaluated against in the first half-year. These are drawn from the Cloud Ops 90-day plan and its extension into the following quarter.
Month 1: SLOs drafted for the top 5 Tier-1 services with Product sign-off. Severity rubric published. First live tabletop Sev-1 run against the interim RACI.
Month 2: Observability platform selection finalized. Instrumentation standard published. Engineer III SRE hired and onboarded.
Month 3: On-call rotation live. First real Sev-1 commanded under the new structure with a blameless postmortem completed and follow-ups tracked.
Month 4–6: Error budget policy in effect for the first 3 services. First incident review at executive level. Interview loop running for the next SRE hires. Initial AIOps evaluation and pilot scope defined.
Who You Are
Bachelor's degree in Computer Science, Engineering, or a related technical field.
7+ years of experience in software or platform engineering, with at least 4 of those in an SRE, DevOps, or platform reliability role.
At least 2 years of formal technical leadership, tech-lead, or staff-level experience with mentorship responsibilities.
Preferred Qualifications
Proven experience leading SRE, DevOps, or platform engineering teams in a cloud-native production environment — with demonstrated experience building a practice from zero or near-zero: you have set SLOs, defined incident command, and introduced error budget thinking to an organization that did not have it.
Deep hands-on expertise with at least one major public cloud (AWS, Azure, or GCP), including networking, IAM, and managed services.
Strong background in CI/CD pipeline design and management (familiarity with CodeFresh, GitHub Actions, Jenkins, TeamCity, or equivalent).
Experience implementing Infrastructure as Code using Terraform (preferred), Chef, Puppet, or similar tools.
Proficiency in Python or another object-oriented programming language for automation, tooling, and production services.
Experience administering and scaling Kubernetes clusters, including secure and compliant platform configurations. Working knowledge of Docker, Helm, and Service Mesh technologies (Istio, Linkerd).
Hands-on experience designing modern observability platforms using tools such as DataDog, Prometheus, Grafana, OpenTelemetry, Elasticsearch/Kibana, or equivalent — with an opinion about what a good telemetry stack looks like.
Familiarity with integrating AI/ML-based anomaly detection, alerting, or LLM-assisted triage pipelines — or strong conviction about where AIOps should and should not be applied in a regulated environment.
Real incident command experience for customer-impacting Sev-1 events, with blameless postmortem practice and documented follow-up discipline.
Ability to coach and mentor, with direct evidence of growing junior and mid-level engineers. You will eventually have 1 direct report.
Comfort operating in a regulated environment where reliability and compliance (HIPAA, SOC 2) are inseparable.
How You’ll Elevate at Omnicell
At Omnicell, success is defined by both outcomes and behaviors. In this role, you will:
Leadership Imperatives (Player‑Coach Role)
This role will eventually have one less senior Site Reliability Engineer reporting to you, you are expected to demonstrate Omnicell’s leadership expectations by:
#LI-MG2
Since 1992, Omnicell has been committed to transforming pharmacy care through outcomes-centric innovation designed to optimize clinical and business outcomes across all settings of care. We strive to be the healthcare provider’s most trusted partner by our guiding promise of “Outcomes. Defined and Delivered.”
Our comprehensive portfolio of robotics, smart devices, intelligent software, and expert services is helping healthcare facilities worldwide to improve business and clinical outcomes as they move closer to the industry vision of the Autonomous Pharmacy.
Our guiding principles inform everything we do:
We are deeply committed to Environmental, Social, and Governance (ESG) initiatives. Our ESG efforts focus on creating an inclusive culture and a healthier world. This includes our Employee Impact Groups, which foster inclusion and belonging, as well as our learning and well-being programs that support personal and professional growth. We also prioritize sustainability in our operations, aiming to reduce our environmental footprint and promote responsible business practices. Join us in transforming the pharmacy care delivery model, making patient care safer and smarter for all.

Omnicell is transforming pharmacy and nursing care through outcomes-centric solutions designed to optimize clinical and business outcomes across all settings of care. Our comprehensive portfolio of robotics and smart devices, intelligent software workflows, and data and analytics, all optimized by expert services are helping healthcare facilities worldwide to reduce costs, improve labor efficiency, establish new revenue streams, enhance supply chain control, support compliance, and move closer to the industry vision of the Autonomous Pharmacy. To learn more, visit omnicell.com.