Job Description
About Ontrac Solutions
At Ontrac Solutions, we partner with elite engineering organizations to build systems that operate at planetary scale. Our team supports complex cloud, infrastructure, automation, and production engineering initiatives for organizations modernizing critical platforms and high-availability environments.
We are seeking a highly skilled Senior Production Engineer — IC4 to support a critical customer engagement. This role is ideal for a hands-on engineering professional with deep experience in infrastructure modernization, Linux systems, Python automation, production support, and large-scale migration execution.
The Senior Production Engineer will work closely with Cloud Platform Engineering, CloudTech SRE, internal engineering teams, and customer stakeholders to support the modernization of legacy infrastructure into production-ready environments.
This individual will help lead complex operating system upgrades, packaging migrations, configuration management transitions, observability improvements, CI/CD hardening, and service onboarding efforts across a large-scale infrastructure footprint.
The ideal candidate is comfortable executing independently, owning technical workstreams, resolving complex production issues, and documenting repeatable processes for long-term operational success.
Key Responsibilities
Lead and execute large-scale OS modernization efforts, including migrations from RHEL7 to EL8/EL9 across approximately 1,700 systems and virtual machines
Support configuration management transitions, including Chef to CINC and legacy package/configuration migration from yinst to RPM
Build, maintain, and configure RPM packages to support infrastructure modernization and application migration efforts.
Develop, execute, and improve automated runbooks for OS upgrades, configuration changes, service onboarding, and production support.
Triage, own, and resolve complex production issues, including high-priority S-bugs and infrastructure-related incidents.
Harden CI/CD pipelines, observability frameworks, and rollout/rollback mechanisms for legacy-to-modern infrastructure transitions.
Partner closely with CloudTech SRE to provide follow-the-sun Tier-2 production support, including hands-on incident response and break/fix operations.
Onboard services to modern monitoring, logging, and observability stacks.
Support migrations from legacy monitoring tools such as Yamas to platforms such as Chronosphere, Prometheus, and Grafana
Assist with log management and Splunk integration strategies.
Partner with application development teams during cloud cutovers, component migrations, and production readiness activities.
Automate repetitive operational tasks using Python and related tooling.
Document technical procedures, runbooks, migration steps, and operational standards.
Required Qualifications
-
5+ years of professional software engineering, production engineering, SRE, DevOps, or infrastructure engineering experience.
Strong hands-on experience with Python for automation, tooling, scripting, and operational workflows.
Experience supporting Linux infrastructure in production environments, ideally including RHEL7, EL8, and EL9
Experience with OS modernization, infrastructure migration, or large-scale systems upgrade initiatives.
Hands-on experience with package management and build processes, preferably including RPM packaging
Experience with configuration management tools such as Chef, CINC, Ansible, Puppet, or similar platforms
Strong understanding of production support, incident response, break/fix workflows, and Tier-2 operational support.
Experience hardening CI/CD pipelines and supporting safe rollout/rollback processes.
Familiarity with observability, monitoring, logging, and alerting frameworks.
Ability to work independently, manage technical tasks, and communicate clearly with engineering and stakeholder teams.
Strong documentation skills and the ability to create repeatable runbooks and operational procedures.
Preferred Qualifications
Experience with Chef to CINC migrations.
Experience with yinst to RPM migration or similar legacy packaging transitions.
Experience supporting monitoring migrations from Yamas to Chronosphere, Prometheus, or Grafana
Experience with Splunk log management strategy and integration.
Experience supporting developers through cloud cutovers and application migration phases.
Experience working with Cloud Platform Engineering, SRE, or infrastructure modernization teams.
Familiarity with NetAuto or similar network automation / operational support tooling.
Experience operating in a follow-the-sun support model.
Prior experience supporting high-scale cloud, infrastructure, or platform engineering environments.
Scope of Work / Delivery Expectations
The contractor will help drive the technical transition of legacy systems to modern infrastructure environments. Expected workstreams include:
Migrating and updating configurations across approximately 1,700 systems and virtual machines from RHEL7 to EL8/EL9
Developing and executing automated runbooks for OS upgrades and configuration management changes.
Building and maintaining RPM packages to replace legacy configuration and packaging processes.
Supporting the transition of monitoring infrastructure to a modern observability stack, including Chronosphere, Prometheus, and Grafana
Supporting Splunk integration and logging strategies.
Providing Tier-2 operational support and incident response under a follow-the-sun model.
Partnering with application developers during cloud migration and cutover phases.
Improving CI/CD pipelines, deployment safety, and rollback readiness.
Creating documentation to support repeatable operational processes and long-term platform maintainability.