Job Description

We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes.

As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale.

This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end.

You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure.

What You’ll Do

Build and optimize high-performance Kubernetes-native GenAI inference systems
Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling
Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo
Design and implement optimization algorithms and performance improvements
Improve reliability, observability, deployment, and operational maturity of AI systems
Make architectural decisions and take ownership of technical outcomes
Collaborate with a small, senior engineering team focused on performance and production quality

Requirements

Required Qualifications

Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skills
Programming experience in Go and Python
Hands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemetry
Experience with cloud platforms
Strong understanding of optimization algorithms and performance engineering
Ability to independently drive technical initiatives from concept to production
Strong systems thinking and debugging skills
Comfort operating in environments with high autonomy and responsibility

Nice to Have

Experience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-LLM
Experience with distributed LLM inference frameworks such as llm-d or NVIDIA Dynamo
Contributions to open-source Kubernetes or ML infrastructure projects
GPU performance optimization and profiling experience
Familiarity with CUDA, NCCL, or Triton kernels
Experience running GenAI systems at scale in production

About NeuReality

NeuReality is a venture-backed deep tech AI startup transforming AI inferencing for data centers globally.

Our mission: to make AI accessible and ubiquitous. We break down the cost and complexity barriers that currently prevent over 60% of businesses and governments from enterprise adoption, making AI more profitable, sustainable, and simpler to deploy.

We champion purpose-built inference architecture. Our NR1 AI Inference Solutions, featuring our revolutionary NR1 Chip, integrate seamlessly with any GPU, AI Accelerator, or AI Model to unlock peak performance. As the world's first AI-CPU designed for ultimate cost and energy efficiency, the NR1 Chip redefines AI price/performance delivering 6.5 more AI m/tokens per dollar and power envelope than legacy x86 CPUs.

From innovative NR Software to generative and agentic AI-ready NR1 Inference Appliances, NeuReality delivers breakthrough AI capabilities that are immediately accessible and economically viable for every business and government.

Industry

Hardware & Semiconductors

Company Size

51-200 employees

Headquarters

Caesarea, IL

Year Founded

2019

Website

neureality.ai

Social Media

Senior SW Engineer – AI Infrastructure & Optimization

Job Description

Requirements

About NeuReality