AI Success™

Job Description

Introduction to the Role

Transform AI into a true force multiplier for enterprise operations! This role advises how machine learning and artificial intelligence platforms are run, automated, and improved across Azure and AWS to support critical scientific and business outcomes. The focus is on defining system architecture, monitoring, and automation, ensuring operations itself becomes a benchmark for AI adoption.

Managing a layered operations framework—spanning L1 runbook operators, L2 site reliability engineers (SREs), and an L3 product engineering interface—this position establishes continuous improvement. The goal is to eliminate manual toil, increase first-line resolution, and free engineering time for higher-value work. Every incident must become a detailed procedure, an automated process, or a permanent fix.

This is not standard support; it is a leadership role for an engineering-minded operator setting precise technical direction!

Accountabilities

Operational Model Ownership Lead and evolve the three-tier operations model for the AI/ML platform estate. Enforce operational readiness gates and run monthly reviews using clear metrics: L1 resolution rate, repeat incident rate, automation coverage, and toil budget compliance.
Technical Direction Establish instrumentation and alerting standards for a centralised observability layer (Datadog, New Relic, Grafana, Splunk). Guide the architecture of AI-augmented operations tooling, including conversational runbooks, and direct L2 SRE patch contributions to resolve root causes.
Automation Strategy Mandate that every incident yields a runbook, an automation, or a patch. Define acceptable toil thresholds and prioritise automation investments by incident frequency, resolution time, and blast radius.
Team Leadership and Development Direct the L2 SRE team aligned to cloud domains. Build a robust talent pipeline from L1 to L2, fostering a culture where SREs operate as engineers dedicated to operational excellence.
Stakeholder Interface Partner with product engineering to co-own post-mortems, negotiate handovers, and embed operational requirements into early architecture decisions. Communicate platform health, risks, and investments clearly to senior leadership using data-driven narratives.

Essential Skills and Experience

Academic Background BSc/MSc/PhD in Computer Science or a related analytical field.
Operations Leadership Demonstrable recent experience building and leading large-scale SRE or platform operations functions hands-on.
Observability Expertise Deep technical knowledge of platforms such as Datadog, New Relic, Grafana, or Splunk, covering dashboard development, alerting strategies, and telemetry pipeline architecture.
Modern Standards Solid understanding of OpenTelemetry, distributed tracing, and structured logging.
Automation Delivery Consistent track record of designing and implementing automation that materially reduces operational toil.
Cloud Infrastructure Strong grasp of Azure and/or AWS, including container orchestration, serverless architectures, and managed services.
Incident Management Proven ability to run post-mortem processes and translate findings into preventative engineering, alongside experience defining platform handover criteria.
Technical Mentorship Capability to provide precise technical direction on system instrumentation, alert triggers, and automation interventions while developing high-performing teams.

Desirable Skills and Experience

AI/ML Operations Application of AI/ML to operational challenges (e.g., intelligent alerting, automated diagnosis, conversational interfaces).
Workload Management Experience operating platforms serving AI/ML workloads such as LLM inference, model serving, and data pipelines.
Industry Context Familiarity with regulated pharmaceutical environments or the AstraZeneca technology estate.
Frameworks ITIL, SRE, or operational excellence certifications.

Working Environment

Bringing unexpected teams together sparks bold thinking. To facilitate this, we operate on a hybrid model, working an average of three days per week from the office while respecting individual flexibility. Join us in our unique and ambitious world!

Why AstraZeneca

Advanced data and AI are embedded in how medicines are discovered, developed, and delivered. This leadership role directly improves speed, reliability, and safety at a global scale. Work alongside deep specialists, apply cutting-edge techniques with real-world impact, and help shape how AI is used daily. The culture values kindness alongside ambition, encourages clear thinking, and provides the space to publish breakthroughs that advance the field.

Equal Opportunity

AstraZeneca is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Call to Action

Lead the charge to build an AI-augmented operations engine that frees engineers to innovate and accelerates patient impact. Apply today to start the conversation!

#EAI

Date Posted

18-May-2026

Closing Date

04-Jun-2026

AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorization and employment eligibility verification requirements.

About AstraZeneca

We're transforming the future of healthcare by unlocking the power of what science can do for people, society and the planet. For more information, visit www.astrazeneca.com.

Community Guidelines: bit.ly/2MgAcio

Industry

Chemicals & Materials

Company Size

10,000+ employees

Headquarters

Cambridge, GB

Year Founded

Unknown

Website

astrazeneca.com

Social Media

Director, AI Operations