Google

Software Engineer, TPU Compiler Development Infrastructure

Google  •  $147k - $211k/yr  •  Sunnyvale, CA (Onsite)  •  8 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
65
AI Success™

Job Description

Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 2 years of experience with coding in C++ and Python, or 1 year of experience with an advanced degree.
  • 2 years of experience working with Google Infrastructure such as Blaze, TAP, or Guitar.

Preferred qualifications:

  • Master's degree or PhD in Computer Science, or a related technical field.
  • Interest in becoming an expert in infrastructure surrounding low-level ML hardware programming.

About the job

Our team develops the Accelerated Linear Algebra (XLA) compiler which enables TPUs, Google's in-house custom designed processor, to accelerate machine learning and other scientific computing workloads for both internal Google customers and external Cloud customers.

The XLA TPU team is reaching a critical threshold of complexity at a time when the demand for rapid iteration has never been higher. This role is designed to manage the infrastructure friction that compiler engineers face daily, effectively multiplying output of the entire team.

In concrete terms we need to pull down the average team presubmit latency from the current 1.5. hours to 20 min and minimize Changelist (CL) rollback (catch issues early).

While this position does not require prior experience with compilers, hardware, or deep ML expertise, bout it does require someone who genuinely enjoys the craft of building great infrastructure unblocking developer productivity.

The US base salary range for this full-time position is $147,000-$211,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google

The XLA TPU team is reaching a critical threshold of complexity at a time when the demand for rapid iteration has never been higher. This role is designed to manage the infrastructure friction that compiler engineers face daily, effectively multiplying output of the entire team.

In concrete terms we need to pull down the average team presubmit latency from the current 1.5. hours to 20 min and minimize Changelist (CL) rollback (catch issues early).

While this position does not require prior experience with compilers, hardware, or deep ML expertise, bout it does require someone who genuinely enjoys the craft of building great infrastructure unblocking developer productivity.

The US base salary range for this full-time position is $147,000-$211,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google

Responsibilities

  • Reduce CL time to submit for a CL and minimize CL rollback for the whole XLA TPU team. Drive infrastructure improvements that remove friction from the daily development of the XLA TPU Compiler team.
  • Develop tools supporting compiler engineers as they work through stages of new TPU introduction (e.g., testing when hardware is not yet available or very limited).
  • Modernize and simplify build/test fixtures (e.g. xla_test) to make them more reliable and easier for the team to use.
  • Design and implement system architectures which cleanly handle ever increasing number of TPU generations and compiler features, ensuring the codebase doesn't become a "spaghetti" of special cases.
  • Identify and resolve accelerator utilization bottlenecks, improve accelerator test coverage without slowing down CL submission.
Google

About Google

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Check out our career opportunities at goo.gle/3DLEokh

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Mountain View, CA
Year Founded
Unknown
Social Media