AI Success™

Job Description

MLOps Engineer — AI/ML Systems & Deployment (TS/SCI Preferred)
Dayton, OH (On-site Preferred) | Remote Eligible (CAC-Ready Candidates)
Mission Environment | AI/ML Infrastructure | National Security Impact

About the Role

At Rackner, we are building the operational backbone that turns AI/ML capability into real-world mission outcomes. We are seeking an MLOps Engineer to own the lifecycle of AI/ML systems—from experimentation to deployment—within a mission-critical, classified environment supporting Air Force and NASIC-aligned programs.

This is not a research role; This is where models become reliable, deployable, auditable systems.

You will operate at the intersection of:

Machine learning
Distributed systems
Cloud-native infrastructure

…and ensure that AI/ML systems work in the environments where failure is not an option.

What You’ll Do

Own the ML Lifecycle (End-to-End)

Build and operate production-grade ML pipelines
Orchestrate workflows using Kubeflow, Airflow, or Argo
Implement model versioning, lineage, and reproducibility standards

Operationalize AI/ML Systems

Deploy models into mission environments (including constrained or classified systems)
Transition workflows from Jupyter experimentation → containerized pipelines → production systems
Enable both batch and real-time inference architectures

Engineer for Reliability, Not Just Performance

Design systems for reproducibility, auditability, and stability
Implement monitoring for:
- model performance & drift
- system health & latency
Use tools like Prometheus, Grafana, and OpenTelemetry

Build Cloud-Native ML Infrastructure

Deploy and manage Kubernetes-based ML workloads
Containerize pipelines using Docker / OCI standards
Scale compute for training and inference workloads

Establish Data Discipline

Enable data versioning and governance (lakeFS or similar)
Support feature engineering and dataset preparation pipelines
Apply metadata standards (e.g., STAC) where applicable

Create Repeatable Systems

Develop runbooks, playbooks, and deployment standards
Build systems that can be operated by others; not just understood by you

What You Bring

Core Experience

Experience deploying ML systems into production environments
Strong background in Python and ML frameworks (PyTorch, TensorFlow, etc.)
Hands-on experience with:
- ML pipeline orchestration tools (Kubeflow, Airflow, Argo)
- Experiment tracking (MLflow, ClearML)

Infrastructure & Systems

Experience with Kubernetes and containerized workloads
Familiarity with CI/CD for ML systems
Understanding of distributed systems and scalable architectures

ML Application Exposure

Experience working with:
- LLMs or transformer-based models
- computer vision systems (YOLO, Faster R-CNN)
Focus on deployment and integration, not pure research

Mindset

Systems thinker who values reliability over novelty
Comfortable operating in ambiguous, high-stakes environments
Able to translate experimental work into operational capability

Why This Role Matters (What You Get)

This role is a career accelerator for engineers who want to:

Move beyond experimentation
- Own systems that actually get deployed and used
Operate at the systems level
- Work across ML, infrastructure, and mission integration
Build in high-trust environments
- Where correctness, auditability, and reliability matter
Develop rare, high-demand expertise
- MLOps in constrained / classified environments is a differentiated skillset

Shape how AI is operationalized—not just built

Who We Are

Rackner is a software consultancy that builds cloud-native solutions for startups, enterprises, and the public sector. We are an energetic, growing consultancy with a passion for solving big problems across industries.

We enable digital transformation through:

Distributed systems
DevSecOps
AI/ML
Cloud-native architecture

Our approach is cloud-first, cost-effective, and outcome-driven—focused on delivering real capability, not just code.

Benefits & Perks

100% covered certifications & training aligned to your role
401(k) with 100% match up to 6%
Highly competitive PTO
Comprehensive Medical, Dental, Vision coverage
Life Insurance + Short & Long-Term Disability
Home office & equipment plan
Industry-leading weekly pay schedule

Apply

If you’re an engineer who wants to move from building models → owning systems, we want to talk.

#MLOps #MachineLearning #Kubernetes #AIEngineering #CloudNative #DevSecOps #ArtificialIntelligence #DataEngineering #DefenseTech #NationalSecurity #AIInfrastructure #Hiring #TechCareers

About Rackner

Rackner builds cutting-edge solutions that apply DevSecOps and the power of AI in the datacenter, public and private clouds, and edge, leveraging the future of compute capability and technologies like Kubernetes (k8s) and WebAssembly (WASM). We're a member of the Cloud Native Computing Foundation and a Kubernetes Certified Service Provider - as well as a partner to the major public cloud companies.

Our customers include hypergrowth startups and federal agencies, both Civilian and Defense.

Core Competencies

- DevSecOps

- Edge Computing

- AI/ML

- Cloud-Native and Hybrid-Cloud development

- Web and Mobile Applications Development (Microservices)

Industry

IT & Software

Company Size

11-50 employees

Headquarters

Silver Spring, Maryland

Year Founded

2015

Website

rackner.com

Social Media