NextSilicon

Serbia- AI Workloads Engineer

NextSilicon  •  Serbia (Onsite)  •  3 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
59
AI Success™

Job Description

NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.

At NextSilicon, everything we do is guided by three core values:

  • Professionalism We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
  • Unity Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
  • Impact We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.

The AI Workloads team is responsible for modeling and enabling end-to-end AI workflows on NextSilicon’s next-generation hardware platforms. As an AI Workloads Engineer in Belgrade, you’ll build workflow modeling infrastructure, run and adapt open-source AI systems, and use real workloads to drive performance improvements from chip design through production.

Requirements

  • 4+ years of experience in software engineering.
  • Strong Python and PyTorch development experience.
  • Solid understanding of LLMs and modern inference workflows (e.g., KV cache, paged attention, speculative/assisted decoding, batching/scheduling)
  • Experience running, profiling, and instrumenting open-source AI inference systems (e.g., vLLM or similar)
  • Proficiency in C++ for developing software that models or interacts with hardware execution behavior (latency, dataflow, memory access patterns).
  • Experience with distributed inference and collectives (e.g., NCCL) and parallelism strategies (TP/PP/EP) is an advantage
  • Experience with dynamic batching systems (e.g., vLLM, TensorRT-LLM) is an advantage
  • Familiarity with MLPerf Inference benchmarks and methodology (Server/Offline, latency constraints, request arrival patterns) is an advantage
  • Experience programming custom kernels (e.g., CUDA, Triton, or similar) is an advantage
  • Background in performance analysis, simulation, compiler/runtime profiling, or workload modeling is an advantage

Responsibilities

  • Model and analyze end-to-end AI workflows (e.g., assisted decoding, dynamic batching, dynamic KV cache, MLPerf-like scenarios) on NextSilicon platforms, from simulation through production.
  • Run and adapt open-source AI workloads, collecting and analyzing metrics such as latency, throughput, and traversal or arrival statistics.
  • Use SDK and framework-integration tools to profile full-stack behavior, identify performance bottlenecks, and drive improvements with compiler, runtime, and hardware design teams.
  • Prototype custom kernels or runtime components when needed to enable or optimize new AI workflows on NextSilicon hardware.
NextSilicon

About NextSilicon

We believe in a smarter future and want to create new opportunities for innovation. In order to achieve this, we’re rethinking compute architectures for the future of computer processing.

Industry
Hardware & Semiconductors
Company Size
201-500 employees
Headquarters
Giv'atayim, IL
Year Founded
2017
Social Media