Madiff

Site Reliability Engineer (AI)

Madiff  •  Remote  •  1 month ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description


This is a remote position.


We are looking for a Senior Site Reliability Engineer to support advanced AI platforms responsible for production-grade applications and pipelines. The role focuses on building and maintaining reliability, scalability, and operational excellence across multiple AI-driven systems.


The engineer will work on a central operational layer for monitoring and managing AI workloads, improving system stability, and reducing incidents. This is a hands-on role requiring direct involvement in diagnosing production issues, implementing fixes, and optimising monitoring, alerting, and CI/CD processes.


The position requires close collaboration with engineering teams to improve release quality, standardise telemetry, and ensure stable and predictable system behaviour in a distributed cloud environment.


Responsibilities


• Build and maintain central monitoring and alerting layer for AI applications and pipelines

• Define and implement SLIs, alerts, and operational dashboards

• Manage incidents including triage, coordination, root cause analysis, and prevention

• Standardise telemetry across systems including latency, throughput, and failures

• Optimise CI CD pipelines and introduce quality gates for reliability

• Work closely with engineering teams to reduce recurring issues and improve stability


Requirements


• Minimum

5+ years of experience

in SRE, Platform, or Production Engineering

• Strong hands on experience with

Kubernetes

and production environments

• Experience with

Azure and Azure DevOps

• Experience with monitoring tools such as

Datadog

• Strong understanding of

incident management and root cause analysis

• Ability to build practical monitoring and alerting systems

Nice to have

• Experience with

AI or LLM pipelines

• Experience building monitoring platforms across multiple systems

• Experience with

Grafana

• Experience working in large scale or distributed environments

Expectations

• Strong ownership mindset and accountability for system stability

• Proactive approach to identifying risks and improvements

• Hands on engineer actively working with systems, not only coordinating

• Comfortable working in dynamic and evolving environments


Benefits


• Solid, competitive salary

• Work in a multinational environment on international projects

• Comprehensive healthcare

• Long-term B2B contract with a stable project pipeline

• Work model: fully remote
Madiff

About Madiff

We are an international Innovation, IT and high-tech engineering consulting company that delivers unique value in a wide variety of industries.

Our mission is to add value to our customers businesses by providing digital and technological innovation services, delivering disruptive results and making our clients stand out in their market.

We are driven by a creative and innovative consulting approach strongly oriented to getting results. We MAKE THE DIFFERENCE.

Poland | UK | Switzerland | USA | Norway | Portugal | Spain | France | Singapore

Warsaw

Prosta 20

Wrocław

Rybacka 7

Lublin

Grottgera 2

Industry
IT & Software
Company Size
51-200 employees
Headquarters
Warszawa, PL
Year Founded
2015
Website
madiff.eu
Social Media