
In this role, you will lead the team responsible for building reliability into our products from early architecture through global deployment. You will shift our focus from reactive troubleshooting to scalable strategy, partnering with Design teams and APAC manufacturers to define specifications and mitigate hardware risks before they hit production. Ultimately, you will own the technical strategy for NPI reliability frameworks, drive systemic root-cause failure analysis, and oversee the health of our active global fleet to ensure our infrastructure remains highly resilient.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving team behind Google's groundbreaking innovations, empowering the development of our AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.
Check out our career opportunities at goo.gle/3DLEokh