AI Success™

Job Description

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

About the Technology Organization

Technology at Lilly builds andmaintainscapabilities using pioneering technologies like most prominent tech companies. What differentiates Technology at Lilly is that we create new possibilities through tech to advance our purpose – creating medicines that make life better for people around the world, like data driven drug discovery and connected clinical trials. We hire the best technology professionals from a variety of backgrounds, so they can bring an assortment of knowledge, skills, and diverse thinking to deliver solutions in every area of our business.

About the Business Function

The Software Product Engineering (SPE) team is aspecializedengineering group that delivers strategic solutions and differentiated capabilities. We take a forward-thinking approach, focusing on an enterprise platform and product mindset, ensuring that the solutions we build can beleveragedacross Technology teams for broader impact and efficiency.

As a Principal Support Engineer – Operations (R3), you will be the senior technical authority for production support across a suite of products and services. You will leadcomplexincident resolution, drive systemic reliability improvements, and influence operational standards across teams. This role expands beyond advanced troubleshooting to include end-to-end ownership of major incidents, deep technical remediation, automation to reduce operational toil, and mentoring of support engineers. You will partner closely with Engineering, Product, QA, Security, and Platform teams to ensure resilient services, strong operational readiness, and measurable improvements in uptime, latency, and customer experience.

WhatYou’llBe Doing (Key Responsibilities)

1) Advanced Incident Leadership & Resolution

Act as the final escalation point for the most complex, high-impact production issues spanning frontend, backend, integrations, data stores, and cloud infrastructure.

Lead major incident response (swarming/war-room execution), including triage strategy, technical direction, and recovery coordination across multiple teams.

Drive consistent incident execution aligned with incident management expectations (escalation, outage/deviation considerations, andappropriate stakeholdervisibility).

2) Problem Management, RCA, and Defect Elimination

Own and drive Root Cause Analysis (RCA) for recurring and severe incidents;identifysystemic failure patterns and champion long-term fixes over workarounds.

Partner with engineering to translate RCA outcomes into durable changes (code, configuration, architecture, monitoring, or process), and track fixes to closure with measurable reliability impact.

3) Reliability Engineering & Operational Excellence

Lead initiatives to improve availability, performance, scalability, and operational resilience (e.g., reducing MTTR, improving detection, reducing repeat incidents).

Define and implement operational guardrails: readiness checks, runbooks, rollback patterns, post-release validation, and shift-left operational readiness with Dev/QE.

Contribute to or lead stabilization work consistent with engineering/SRE responsibilities (reliability improvements, defect elimination, major-incident swarming).

4) Observability, Monitoring & Automation

Design and evolve observability across logs/metrics/traces; improve signal quality (actionable alerts, noise reduction, meaningful dashboards).

Build automation for common operational tasks (triage, remediation, reporting), using scripting and tooling to reduce manual effort and improve consistency.

5) Deployment & Change Support

Provide senior support for deployments/releases: risk assessment, go/no-go input, rollback readiness, and rapid response for post-release issues.

Improve CI/CD operational safety through better validation, monitoring hooks, and release checklists in partnership with DevOps/Platform teams.

6) Compliance, Security & Regulated Environment Readiness

Ensure support processes and fixes align with internal standards and external regulations (e.g., GDPR, HIPAA where applicable).

Promote secure operational practices: least privilege, auditability, secure debugging, andappropriate handlingof sensitive data during incident response.

7) Knowledge Leadership & Mentoring

Raise the operational bar by creating and governing high-quality runbooks, knowledge base articles, and operational standards; ensure reusability and adoption across teams.

Mentor L2/R2 engineers: technical coaching, incident handling patterns, RCA quality, and effective cross-team collaboration—acting as a role model for knowledge sharing.

How You Will Succeed (Success Profile)

At R3, success is measured not only by resolving incidents, but by preventing them, improving reliability at scale, and influencing standards across teams:

Be a recognized technical expert who solves complex problems and introduces improved methods/approaches for operations and reliability.

Lead technical decisions during incidents and influence operational standards, technical direction, and cross-team alignment.

Demonstrate strong systemsthinkingunderstandfailure modes across distributed services, data stores, networks, and cloud infrastructure.

Drive measurable outcomes (examples): reduced repeat incidents, improved alert quality, lower MTTR, improved SLO attainment, reduced manual toil.

Communicate crisply under pressure,facilitatingfast alignment between engineering, product, and stakeholders during major incidents.

What You Should Bring (Qualifications)

Required

7–10 years of experience in application support, production engineering, SRE, or software engineering with strong operations ownership (including high-severity incident response).

Deep hands-on debugging across web applications (frontend + backend), integrations, and production environments.

Strong experience with incident management and ticketing workflows (e.g., ServiceNow, Jira), including major incident execution and RCA.

Strong knowledge of RESTful APIs, databases (e.g., PostgreSQL), caching/data stores (e.g., Redis), and cloud platforms (AWS/Azure/GCP).

Expertisein monitoring/logging/alerting stacks (e.g., CloudWatch, ELK, Datadog, Splunk/AppDynamics or equivalent) and the ability to build actionable observability.

Advanced scripting/automation capability (e.g., Bash, Python, JavaScript) to reduce toil and standardize response.

Experience supporting products in regulated industries; working knowledge of privacy/security expectations and secure handling.

Strong collaboration and communication skills across Dev, QA, Product, Security, and platform teams.

Preferred / Nice to Have

Experience defining and operationalizing SLIs/SLOs, error budgets, and reliability reporting (SRE ways of working).

Experience with containerization and deployment patterns (Docker/Kubernetes/ECS), CI/CD systems, and infrastructure-as-code concepts.

Demonstrated mentoring/leadership: raisingthe capabilityof teams through coaching and standards.

Additional Information

Availability to work flexible work hours is/may be required This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidaysAppropriate adjustments in benefits will be provided for employees working non-standard hours where applicable

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form ( https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

About Eli Lilly and Company

We're a medicine company turning science into healing to make life better for people around the world. It all started nearly 150 years ago with a clear vision from founder Colonel Eli Lilly: "Take what you find here and make it better and better." Harnessing the power of biotechnology, chemistry and genetic medicine, our scientists are urgently advancing science to solve some of the world's most significant health challenges.

General Information and Guidelines:

When you engage with us on LinkedIn, you're agreeing to these Community Guidelines: https://e.lilly/guidelines.

If you have questions about a Lilly medicine, contact The Lilly Answers Center at 1-800-Lilly-Rx (1-800-545-5979) Monday through Friday, excluding company holidays.

Industry

Chemicals & Materials

Company Size

10,000+ employees

Headquarters

Indianapolis, Indiana

Year Founded

Unknown

Website

lilly.com

Social Media