RoboForce

Senior / Staff AI Research Engineer, Real-Time Inference

RoboForce  •  Milpitas, CA (Onsite)  •  1 month ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
55
AI Success™

Job Description

Why RoboForce


RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
We are looking for a Senior / Staff AI Research Engineer, Real-Time Inference to make embodied AI practical on the edge. In this role, you will drive the full stack of model optimization — from CUDA kernel engineering to quantization and compression — to deploy high-performance AI models on edge compute platforms powering RoboForce robots in the field.
Responsibilities


  • Develop and optimize inference pipelines for embodied AI models (VLA, perception, world models) targeting real-time execution on edge hardware such as NVIDIA Jetson platforms.

  • Implement CUDA-level optimizations including custom kernels, memory layout tuning, and hardware-aware graph compilation to minimize model latency.

  • Apply and advance model compression techniques — quantization (INT8/FP16/INT4), pruning, distillation, and structured sparsity — to achieve production-grade throughput on constrained devices.

  • Profile and debug end-to-end inference stacks using tools such as NSight, TensorRT, and Triton to identify and eliminate performance bottlenecks.

  • Collaborate with ML research and robotics teams to co-design model architectures that meet real-time control-loop latency requirements.

  • Establish benchmarking frameworks to evaluate model performance across latency, throughput, power consumption, and accuracy tradeoffs on target hardware.

Requirements


  • Master's degree in Computer Science, Electrical Engineering, or related field with 4+ years of experience, or a PhD degree.

  • Deep expertise in CUDA programming, GPU architecture, and low-level kernel optimization, including custom kernel authoring with tools such as Triton.

  • Hands-on experience with model quantization, pruning, distillation, and deployment using frameworks such as TensorRT, ONNX Runtime, TVM, or Triton.

  • Proficiency in C++ and Python; strong systems programming and performance profiling skills.

  • Experience deploying ML models on edge or embedded hardware (e.g., NVIDIA Jetson, Orin, or equivalent ARM/GPU SoCs).
  • Requires 5 days/week in-office collaboration with the teams.

Bonus Qualifications


  • Familiarity with embodied AI models — VLA, multimodal transformers, or diffusion-based policies — and their inference characteristics.

  • Familiarity with compiler-based optimization pipelines such as XLA, torch.compile, or MLIR for graph-level model acceleration.

  • Understanding of robotics system constraints such as control-loop timing, sensor fusion latency, and memory bandwidth limits on edge SoCs.

  • Publication or production work in efficient deep learning or on-device ML systems.

Benefits


  • Competitive stock options/equity programs.

  • Health, dental, and vision insurance, 401(k) plan.

  • Visa sponsorship and green card support for qualified candidates.

  • Lunches and dinners, a fully stocked kitchen, and regular team-building events.
RoboForce

About RoboForce

RoboForce is building the future of Physical AI — scalable, deployable Robo-Labor designed for demanding industrial environments.

Industry
Architecture & Engineering
Company Size
11-50 employees
Headquarters
Milpitas, California
Year Founded
2023
Social Media