ilert

Site Reliability Engineer (f/m/x)

ilert  •  Köln, DE (Hybrid)  •  5 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Location: Hybrid – Cologne (Rheinauhafen) — 3 days in the office, 2 remote (Tue + Thu)
Team: Engineering · Reports to CTO

Keep the world awake — build reliability at scale

ilert helps thousands of DevOps & IT teams detect, fix, and communicate incidents faster.

Our platform is mission-critical: customers rely on us 24/7 to keep their always-on businesses running.

As a Site Reliability Engineer at ilert, you’ll own the reliability, performance, and scalability of our core platform across AWS, Kubernetes, Kafka, and more.

Tasks

Build & operate a highly available platform

  • Run and evolve our AWS-based infrastructure
  • Operate and optimize self-managed Kafka, ClickHouse clusters and our Observability stack
  • Ensure resilience, disaster recovery, and capacity planning across the stack

Improve reliability & performance

  • Build and maintain SLOs, SLIs, error budgets, and observability dashboards
  • Debug production issues across layers (networking, Kubernetes, application, DB)
  • Improve performance of our ingestion pipeline

Automation & tooling

  • Automate operations with Terraform, Helm, Kubernetes operators, and internal tooling
  • Build tooling for safer deploys, blue/green rollouts, and automated verification
  • Strengthen incident response workflows through deep collaboration with our AI SRE agent team

Security & compliance

  • Implement best practices for workload isolation, secrets management, IAM, and auditability
  • Support our ISO27001 posture by automating controls and hardening our infrastructure

Cross-functional impact

  • Partner with Backend, AI, and Product teams to design reliable services
  • Participate in on-call rotation
  • Lead post-incident reviews and drive reliability improvements long-term

Requirements

  • 3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure Engineer
  • Strong hands-on experience with AWS, Kubernetes, Linux internals, networking, performance tuning
  • Experience operating self-managed distributed systems, ideally Kafka or ClickHouse
  • Strong understanding of observability
  • Experience automating infrastructure with Terraform and CI/CD systems
  • Fluent English (our working language); German optional

Benefits

  • 🚀 Product-centric - 100 % focused on solving a mission-critical pain felt by every always-on business |
  • 🏡 Hybrid freedom - 2 days remote by default; gorgeous Rheinauhafen roof terrace when you’re in town |
  • 🕒 Focus > meetings - We time-box syncs, favour async docs and protect maker time |
  • 🌴 28 days off - …plus public holidays |
  • 🚲 Commute perks - subsidised public transport|
ilert

About ilert

ilert is an AI-first company, offering an all-in-one incident management platform for alerting, on-call management, and incident communication to help companies increase their digital uptime. B2C and B2B companies from across the globe, including well-known brands such as Bertelsmann, IKEA, and REWE, trust ilert to empower their operations teams and ensure everything is running smoothly.

Industry
IT & Software
Company Size
11-50 employees
Headquarters
Cologne, DE
Year Founded
2011
Website
ilert.com
Social Media