10Pearls

Site Reliability Engineer (Lead)

10Pearls  •  Islamabad, PK (Onsite)  •  1 month ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and even high-growth start-ups. But those are just facts. What makes us unique is that we have a true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good and know how to balance the two.

Role:

As an SRE Lead, you will be responsible for owning and scaling the organization’s core platform infrastructure, ensuring high availability and reliability of distributed systems. You will manage the Kubernetes-based substrate along with key components like identity, secrets, storage, registry, and gateway systems.

You will drive SLO frameworks, incident response processes, and production reliability through on-call practices, postmortems, and error budget management. Additionally, you will co-own release strategies and lead initiatives around performance, capacity planning, and system resilience.

You will work closely with engineering and platform teams to enforce infrastructure standards, automate operations, and lead a high-performing platform squad in building reliable, scalable systems.

Key Responsibilities

  • Substrate operation — own the Kubernetes cluster plus Keycloak (identity), Vault (secrets), MinIO (object storage), Harbor (registry), Kong (gateway) — from bootstrap to day-2 operations.

  • SLO framework — define, publish, and defend SLOs for every tier-1 service; own error budgets and burn-rate alerting.

  • Incident response — build the on-call rotation, paging, runbook library, and post mortem culture; lead incident command during P1/P2 events.

  • Release operations — co-own the blue-green / canary release model with L6 Delivery; sign off production-bound releases.

  • Reliability engineering — drive capacity planning, chaos testing, load testing (200 concurrent users target), and performance tuning across layers

  • Air-gap operations — ensure every operational runbook works in a fully offline environment — no assumption of external dependencies.

  • Lead the Platform squad — technically lead 1 Infrastructure Engineer, 1 Observability Engineer, 2 DevOps Engineers; set standards for infra-as-code and automation

Required Qualifications & Skills

  • Bachelor's degree in computer science or related field.

  • 5–8 years in SRE or production-engineering roles running distributed systems at scale.

  • Deep Kubernetes expertise — operators, RBAC, network policy, storage, upgrades.

  • Hands-on with Keycloak / Vault / MinIO / Harbor / Kong or equivalent identity/secrets/storage/registry/gateway stacks.

  • Strong Linux fundamentals and at least one systems language (Go, Rust) or shell/Python for tooling.

  • Proven SLO/SLI authorship and error-budget-driven decision-making

  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, Loki, Tempo).

  • Calm, clear communication during incidents; strong post-mortem writing.

  • Hands-on with infra-as-code — Helm, Kustomize, Terrafor.

Nice to Have

  • Prior experience running air-gapped or on-prem platforms for regulated customers

  • Cilium/Istio service-mesh operation

  • GitOps delivery with ArgoCD or Flux

  • FinOps / cost-attribution experience

  • Certified Kubernetes Administrator (CKA) or equivalent

10Pearls

About 10Pearls

10Pearls is a global full-service AI consultancy and digital technology partner helping businesses re-imagine, digitalize, and accelerate. As an end-to-end digital partner, 10Pearls helps businesses create transformative ‎digital products incorporating exponential tech (AI/ML, Blockchain, IoT, NLP, AR/VR).

Our broad expertise in ‎product management, user experience/design, cloud architecture, software development, data insights ‎and intelligence, cybersecurity, emerging tech, and quality assurance ensures that we deliver solutions ‎that address business needs. 10Pearls' clients include Global 2000 enterprises, high-growth mid-size ‎businesses, and some of the most exciting start-ups across several industries, including healthcare, financial services, ‎energy, education, real estate, retail and hi-tech. ‎Headquartered in the Washington DC metro area, 10Pearls has 1,300+ experts across delivery centers in North America, Latin America (Costa Rica, Peru & Colombia), the United Kingdom, and South Asia. The ‎Washington Post has referred to 10Pearls as a double-bottom-line company that balances profits with ‎a social cause.

You are in good company:

AARP • Coca-Cola • Capital One • PayPal • Hughes • Adobe • Docker • Sprint Medstar Health • Corcentric • Discovery Education • Johnson & Johnson • Zubie • Blackboard • National Geographic • JK Moving • General Dynamics

Awards & Industry Recognition:

•6x Inc. 5000 List of Fastest-Growing Private Companies in U.S.

•2x Built In list of Best Places to work

•2x Exelon IT Honor Roll for Diversity, Equity & Inclusion (DEI)

•2022 Financial Times Recognition for America's Fastest Growing Companies.

•2x Timmy Award winner for Best Tech Culture

•Ernst & Young Entrepreneur of the Year 2020, Mid-Atlantic Finalist (CEO)

•Forrester: Featured AI Consultancy, Top Partner for Custom Software Development & Digital Transformation Service Provider

• Gartner: AI Consulting & System Integration & Featured for Agile & DevOps Services

Industry
IT & Software
Company Size
1,001-5,000 employees
Headquarters
Vienna, Virginia
Year Founded
Unknown
Social Media