NeuReality

Senior SW Engineer – AI Infrastructure & Optimization

NeuReality  •  Kraków, PL (Onsite)  •  4 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes.

As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale.

This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end.

You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure.

What You’ll Do

  • Build and optimize high-performance Kubernetes-native GenAI inference systems
  • Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling
  • Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo
  • Design and implement optimization algorithms and performance improvements
  • Improve reliability, observability, deployment, and operational maturity of AI systems
  • Make architectural decisions and take ownership of technical outcomes
  • Collaborate with a small, senior engineering team focused on performance and production quality

Requirements

Required Qualifications

  • Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skills
  • Programming experience in Go and Python
  • Hands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemetry
  • Experience with cloud platforms
  • Strong understanding of optimization algorithms and performance engineering
  • Ability to independently drive technical initiatives from concept to production
  • Strong systems thinking and debugging skills
  • Comfort operating in environments with high autonomy and responsibility

Nice to Have

  • Experience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-LLM
  • Experience with distributed LLM inference frameworks such as llm-d or NVIDIA Dynamo
  • Contributions to open-source Kubernetes or ML infrastructure projects
  • GPU performance optimization and profiling experience
  • Familiarity with CUDA, NCCL, or Triton kernels
  • Experience running GenAI systems at scale in production
NeuReality

About NeuReality

NeuReality is a venture-backed deep tech AI startup transforming AI inferencing for data centers globally.

Our mission: to make AI accessible and ubiquitous. We break down the cost and complexity barriers that currently prevent over 60% of businesses and governments from enterprise adoption, making AI more profitable, sustainable, and simpler to deploy.

We champion purpose-built inference architecture. Our NR1 AI Inference Solutions, featuring our revolutionary NR1 Chip, integrate seamlessly with any GPU, AI Accelerator, or AI Model to unlock peak performance. As the world's first AI-CPU designed for ultimate cost and energy efficiency, the NR1 Chip redefines AI price/performance delivering 6.5 more AI m/tokens per dollar and power envelope than legacy x86 CPUs.

From innovative NR Software to generative and agentic AI-ready NR1 Inference Appliances, NeuReality delivers breakthrough AI capabilities that are immediately accessible and economically viable for every business and government.

Industry
Hardware & Semiconductors
Company Size
51-200 employees
Headquarters
Caesarea, IL
Year Founded
2019
Social Media