NVIDIA

Manager, Network Simulation and Infrastructure

NVIDIA  •  Tel Aviv, IL (Onsite)  •  5 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

NVIDIA is searching for a strong technical leader to own the backbone of our Networking Research capabilities. We are looking for an Engineering Manager to lead the development of our high-fidelity Network Simulation platform and the extensive on-premise infrastructure that powers it.

In this role, you will lead a team of performance simulation software engineers and DevOps/Infrastructure specialists. You will own the "Simulation-as-a-Service" product-a critical platform used by internal researchers to model next-generation data center architectures. Your mission is to ensure our simulations are accurate, performant, and accessible, while managing the large-scale compute clusters required to run them.

What you'll be doing:

  • Team Leadership: Manage and mentor a team of C++ software engineers and DevOps infrastructure engineers, fostering a culture of performance, reliability, and code quality.

  • Product Ownership (Sim-as-a-Service): Treat the internal simulation platform as a product. Work with research partners to define the roadmap, prioritize features, and ensure high availability for users.

  • High-Performance Simulation: Be responsible for the architecture and optimization of complex network simulation engines (C++ based), ensuring they can scale to model extensive data center topologies with high fidelity.

  • Infrastructure Management: Own the lifecycle of our on-premise compute clusters and servers. Drive decisions on hardware upgrades, prioritisation, and managing system resources.

  • DevOps & Automation: Lead the strategy for CI/CD pipelines, automated testing, and containerized deployments to ensure rapid iteration and stability of the simulation platform.

  • multi-functional Collaboration: Partner with the AI Agents team to expose simulation APIs, enabling agents to run experiments and gather data autonomously.

What we need to see:

  • MSc, Ph.D. or equivalent experience in Computer Science, Electrical Engineering, or a related field.

  • 8+ years of hands-on software engineering experience, with a proven track record of leading technical teams in systems or infrastructure domains for 3+ years.

  • 3+ years of managerial experience.

  • C++ Expertise: Strong background in C++ development for high-performance applications (System-level programming, concurrent programming).

  • Infrastructure & DevOps: Practical experience managing on-premise servers, Linux environments, and modern DevOps tools (Kubernetes, Slurm, Docker, Ansible).

  • Operational Rigor: Ability to manage "heavy" operations-ensuring uptime, monitoring system health, and optimizing hardware utilization.

Ways to stand out from the crowd:

  • Networking Knowledge: Deep understanding of computer networking fundamentals (TCP/IP, Ethernet, InfiniBand, Congestion Control) and data center architectures.

  • Simulation/Modeling: Experience with discrete event simulation (DES) or modeling complex systems.

  • HPC Background: Experience working with MPI, CUDA, or other High-Performance Computing frameworks.

  • Specific Simulators: Familiarity with standard network simulators like OMNeT++, NS-3, or similar proprietary tools.

  • Hardware Knowledge: Understanding of switch micro-architecture or NIC design is a significant plus.

NVIDIA is home to some of the most innovative and dedicated professionals in the industry. We are committed to fostering a diverse work environment and are proud to be an equal-opportunity employer.

NVIDIA

About NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Industry
Hardware & Semiconductors
Company Size
10,000+ employees
Headquarters
Santa Clara, CA
Year Founded
1993
Social Media