Job Description

NVIDIA is seeking a motivated AI Acceleration & Optimization Engineer to join our Acceleration Computing, Optimization and Tools (ACOT) team. In this role, you will help improve the performance, scalability, and efficiency of modern AI models across NVIDIA GPU platforms. You will work with engineers across algorithms, systems, and hardware to support high-performance model deployment and development for real-world AI workloads.

As part of ACOT, you will collaborate with architecture, research, CUDA, compiler, and framework teams to help bring next-generation AI workloads from research to production with strong performance and reliability.

What you will be doing

Assist in optimizing AI models such as LLMs, VLMs, diffusion models, and multimodal models for inference and training on NVIDIA GPUs.
Profile workloads and help identify performance bottlenecks across GPU compute, memory, networking, and storage.
Support the development and integration of optimization techniques such as quantization, kernel fusion, parallelism, and memory efficiency improvements.
Use tools including CUDA, TensorRT, Nsight, and NVIDIA acceleration libraries to analyze and improve model performance.
Work with deep learning frameworks including PyTorch, JAX, and TensorFlow, as well as open-source inference frameworks like vLLM and SGLang.
Contribute to performance benchmarking, testing, and internal tooling to improve optimization workflows.
Partner with senior engineers and multi-functional teams to evaluate workload behavior and support future performance improvements.

What we want to see

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience).
2–4 years of experience, or strong academic/project experience, in deep learning, performance engineering, systems, or high-performance computing.
Good understanding of deep learning fundamentals and modern AI model architectures, especially transformers.
Familiarity with GPU architecture and parallel computing concepts such as CUDA, kernels, memory hierarchy, and streams.
Exposure to profiling and performance analysis tools.
Programming skills in Python.
Experience with at least one major ML framework such as PyTorch, TensorFlow, or JAX.

Ways to stand out from the crowd

Internship, research, or project experience optimizing AI/ML workloads on GPUs.
Hands-on experience with TensorRT, TensorRT-LLM, vLLM, SGLang, or similar inference/runtime frameworks.
Familiarity with quantization, sparsity, or mixed-precision techniques.
Experience with distributed training or inference concepts. Contributions to open-source ML systems, performance tools, or infrastructure projects.
Proficiency in C++, strong debugging skills and interest in low-level performance optimization.

About NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Industry

Hardware & Semiconductors

Company Size

10,000+ employees

Headquarters

Santa Clara, CA

Year Founded

1993

Website

nvidia.com

Social Media

Deep Learning Algorithms Engineer - ACOT

Job Description

What you will be doing

What we want to see

Ways to stand out from the crowd

About NVIDIA