Splunk — Platform Operations & Cost Governance
Splunk is our established, widely-adopted log management and SIEM platform. Your focus here is on sustaining reliability, controlling costs, and improving security — not building from scratch.
• Partner with the Security team to onboard new data sources — validating log formats, agreeing ingestion scope, and ensuring data pipeline stability throughout the onboarding process
• Own data ingestion cost governance: analyse ingestion volumes by team and source, identify waste or noise, and implement controls (event filtering, aggregation, summary indexing) to keep costs within budget
• Define and enforce ingestion standards — teams should ingest only what is needed for their alerting, compliance, and debugging use cases
• Monitor and maintain platform SLIs/SLOs: indexing throughput, search latency, forwarder health, and data availability
• Drive Splunk security improvements: access control reviews, index-level permissions, audit log monitoring, credential hygiene, and alignment with Tesco security policies
• Respond to and resolve platform incidents; maintain runbooks and capacity forecasts
New Relic — Rollout, AI Features & Cost Control
New Relic is in active rollout across Tesco's engineering teams. You will be a key driver of adoption, new capability delivery, and platform governance.
• Lead the rollout of New Relic Anomaly Detection to engineering teams — running discovery workshops, configuring baselines and sensitivity settings, and validating signal quality before handover
• Explore, pilot, and implement New Relic AI features — including AI-assisted alerting, applied intelligence, and MCP (Model Context Protocol) integrations — and build a roadmap for wider adoption
• Manage data ingestion within contracted limits: identify high-volume sources, work with teams to tune instrumentation, and implement ingest drop rules or sampling strategies as needed
• Monitor and maintain New Relic platform SLIs/SLOs: agent coverage, ingest pipeline health, alert reliability, and dashboard availability
• Drive New Relic security improvements: user access governance, API key lifecycle management, SAML/SSO configuration, and audit log review
• Act as the primary escalation point for teams consuming New Relic — supporting onboarding, troubleshooting instrumentation gaps, and advising on NRQL-based alerting and dashboarding
Runscope — Ongoing Operations & Security
Runscope is already rolled out to teams for API monitoring. Your focus is on maintaining reliability, improving security, and ensuring SLOs are met.
• Ensure Runscope API monitoring operates within defined SLIs/SLOs: test pass rates, alert latency, and integration uptime
• Work with teams to review and improve existing test suites — coverage gaps, flaky tests, and alerting thresholds
• Drive Runscope security improvements: access control reviews, token rotation, integration security, and alignment with Tesco security standards
• Support teams in extending API monitoring coverage as new services are onboarded
OpenTelemetry — Infrastructure Onboarding Enablement
A key growth area for the team is helping infrastructure teams adopt observability by deploying and managing the OpenTelemetry (OTel) Collector as the standard telemetry pipeline.
• Design and maintain OTel Collector configurations for infrastructure teams — covering metrics, traces, and logs across on-prem, cloud, and hybrid environments
• Build deployment patterns (Helm charts, configuration templates, automation scripts) that make it easy for infra teams to onboard self-sufficiently
• Manage OTel pipelines routing telemetry to Splunk, New Relic, or both — implementing processors, filters, and exporters as needed
• Document onboarding guides and runbooks; run enablement sessions with infrastructure teams
• Troubleshoot telemetry gaps, data quality issues, and collector performance problems
Platform Governance & Mentorship
• Maintain SLI/SLO definitions and dashboards for all three observability tools; report platform health to the Engineering Manager
• Contribute to security reviews across Splunk, New Relic, and Runscope — proactively identifying gaps and owning remediation
• Mentor SE1 and SE2 engineers: code and configuration reviews, pair debugging, structured feedback aligned to their development plans
• Represent the Observability team in architecture forums, vendor conversations, and cross-functional planning
• Assist the Engineering Manager with hiring: technical screen design and participation in interview panels