At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
The AI Inference Engineerplays a critical role in the AI lifecycle by bridging the gap between high-performance model development and optimized deployment environments. This position focuses onoptimizingLarge Language Models (LLMs)for inference, serving diverse environments—from GPU-rich data centers to resource-constrained edge devices—with a strong emphasis on maximizing throughput, minimizing latency, andmaintainingmodel accuracy.
This role is pivotal in advancing F5’s AI capabilities, ensuring enterprise-grade reliability byleveraginghardware acceleration, designing scalable infrastructure, andmonitoringsystem performance.
Key Responsibilities
High-Performance AI Serving
Build andmaintainrobust inference engines using tools like vLLM, TGI (Text Generation Inference), and NVIDIA Triton, ensuring high performance at scale.
Handle deployment optimizations to deliver low-latency AI serving solutions for multiple business applications.
Hardware Acceleration and Optimization
Profile andoptimizemodels for specialized hardware backends, including NVIDIA GPUs(CUDA/TensorRT), Apple Silicon (CoreML), and AI accelerators like TPUsand LPUs
Collaborate with hardware teams to maximizeutilizationand performance across various computational environments.
Inference Orchestration and Scalability
Design and implement auto-scaling architecturesfor online (real-time) and batch inference pipelines,leveragingKubernetesfor inference routing and orchestration.
Ensure software solutions areoptimizedfor peak performance during traffic spikes,maintainingreliability and scalability.
Performance Monitoring and Observability
Establish robust observability frameworks tomonitorTime to First Token (TTFT), tokens per second, and memory bandwidthutilizationagainst service-level agreements (SLAs).
Build and execute performance and load testing suitestoidentifybottlenecks and ensure consistent reliability at scale.
Technical Requirements
Required Skills:
Programming Languages:Proficiencyin programming languages such as Python, C++, Rust, or Golangspecifically for high-performance AI workflows.
Inference Tools:Proven hands-on experience with tools like vLLM, TensorRT, Llama.cpp, and Ollamafor inference development and optimization.
Infrastructure Expertise:Strong familiarity with infrastructure technologies, including Docker, Kubernetes, and cloud platforms such as AWS, GCP, and Azure
Hardware Optimization Expertise:Comprehensive understanding of GPU and AI hardware, including techniques for profiling and optimizingperformance for accelerators like NVIDIA GPUs and TPUs.
Preferred Experience:
Prior experience deploying Large Language Models (LLMs)with advanced techniques like Speculative Decodingor PagedAttention
Contributions to open-source inference librariesor hardware-level kernel development (e.g., CUDA, Triton kernels).
Background in MLOpsor SREroles focused on high-performance AI endpointsand reliability during demand surges.
Proficiencyin designing scalable solutions for high-throughput inference environmentsoptimizedfor traffic bursts.
Success Metrics (KPIs):
Latency Reduction:Continuously improve inference latency metrics, ensuring minimal Time to First Token (TTFT)and maximumtokens per second.
Cost Efficiency:Achieve lower "Cost per 1K Tokens" through better resource utilizationand hardware optimization.
Scalability:Maintainsystem stability and reliability during traffic spikes, ensuring performance consistency across environments.
Throughput Maximization:Deploy models optimizedfor peak hardware usage and maximized process throughput.
Why Join F5?
F5 empowers you to push boundaries in AI optimizationand high-performance engineering Joining our team means:
Collaborating withcutting-edgetechnologies and hardware solutions to support real-time AI applications.
Advancing your career in a fast-paced, multidisciplinary environment focused on innovation, scalability, and problem-solving.
Driving transformative projects that deliver real-time AI reliability to global customers whilemaintainingcost and efficiency standards.
Working on advanced MLOpssolutionsthat seamlessly scale enterprise AI systems and shape the future of intelligent deployment.
What Success Looks Like:
As an AI Inference Engineerat F5, success is measured by your ability to:
Combine technicalexpertiseand problem-solving skills to deliver low-latency, scalable, and high-performing AI prediction systems.
Collaborate efficiently across cross-functional teams,participatingin knowledge sharing and system refinement.
Demonstrate initiative by driving optimizations across hardware, tools, and orchestration processes, balancing immediate solutions with long-term architectural goals.
Translatecomplex AI and inference workflows into practical solutions that align with F5's strategicobjectives
#LI-AK1
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com)
Equal Employment Opportunity
It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com

F5, Inc. (NASDAQ: FFIV) is the global leader that delivers and secures every app. Backed by three decades of expertise, F5 has built the industry’s premier platform—F5 Application Delivery and Security Platform (ADSP) —to deliver and secure every app, every API, anywhere: on-premises, in the cloud, at the edge, and across hybrid, multicloud environments. F5 is committed to innovating and partnering with the world’s largest and most advanced organizations to deliver fast, available, and secure digital experiences. Together, we help each other thrive and bring a better digital world to life.