NVIDIA

Senior Performance Engineer

NVIDIA  •  State of Israel (Onsite)  •  14 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
51
AI Success™

Job Description

NVIDIA is seeking a highly skilled Senior Performance Engineer to join our Performance and R&D organizations. In this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments. You will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. This is a fast-paced R&D environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.

What you’ll be doing:

  • Profile, benchmark, and analyze AI and HPC workloads on GPU and CPU clusters

  • Explore performance characteristics of high-performance networking and collective communications (e.g., NCCL, RDMA, MPI, RoCE)

  • Identify performance bottlenecks across networking, compute, memory, and system architecture

  • Develop and enhance performance analysis, benchmarking, and diagnostic tools

  • Define performance test plans and establish expectations for new technologies and platforms

  • Collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights

  • Support telemetry collection and data refinement efforts to enable accurate performance analysis

  • Maintain high standards for data quality, reproducibility, and traceability of performance results

What we need to see:

  • B.Sc. or M.Sc. in Computer Science, Computer Engineering, Software Engineering, or equivalent experience

  • 5+ years of experience in performance analysis, systems engineering, or HPC/AI infrastructure

  • Demonstrated expertise in performance analysis skills and methodologies

  • Hands-on experience with high-performance networking (RDMA, MPI, NCCL, congestion control)

  • Strong understanding of system performance metrics (latency, throughput, resource utilization)

  • Exposure to hardware, firmware, or embedded telemetry environments

  • Strong analytical, problem-solving, and communication skills

  • Ability to work effectively in cross-functional, fast-paced R&D teams

Ways to stand out from the crowd:

  • Knowledge of CUDA, NCCL internals, and congestion control algorithms

  • Deep system-level understanding of CPU architectures, GPUs, HCAs, memory, and PCIe

  • Experience with NVIDIA GPUs, CUDA, and deep learning frameworks such as PyTorch or TensorFlow

  • Experience with cloud platforms

  • Proficiency in Python; experience with Bash and C/C++ is a plus as well as a strong experience working in Linux environments

NVIDIA

About NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Industry
Hardware & Semiconductors
Company Size
10,000+ employees
Headquarters
Santa Clara, CA
Year Founded
1993
Social Media