Job Description

Requirements:

4 years of experience as a fullstack or backend engineer
Strong proficiency in Python and JavaScript/TypeScript
Experience with FastAPI / Django / Node.js and React / Next.js
Solid understanding of distributed systems and async architectures
Hands-on experience deploying LLMs such as GPT-4/4.1, Claude, LLaMA, Mistral, Mixtral
Experience serving models using vLLM, Triton, TGI, or similar frameworks
Strong understanding of transformer models and inference trade-offs
Experience with embeddings, vector search, and RAG architectures
Experience with AWS, GCP, or Azure (GPU workloads preferred)
Strong Docker and Kubernetes experience
Familiarity with CI/CD pipelines for ML systems
Experience with observability tools (Prometheus, Grafana, OpenTelemetry)
Experience with multimodal AI (audio, video, image models)
Experience optimizing LLM inference costs at scale
Startup or high-growth environment experience
Prior work on AI-first or AI-native products

Responsibilities:

Deploy and optimize LLMs (open-source and commercial) for production use
Implement inference optimization techniques (quantization, batching, caching, distillation)
Build and maintain RAG pipelines (embeddings, vector databases, retrieval strategies)
Evaluate and improve model quality (latency, accuracy, hallucination reduction, cost)
Implement prompt management, versioning, and A/B testing
Design and develop scalable APIs for AI-driven features
Deploy and manage model-serving infrastructure (Docker, Kubernetes, GPUs)
Optimize hardware utilization for inference workloads
Implement monitoring, logging, and alerting for AI services
Ensure security, data privacy, and compliance across AI pipelines
Build internal tools and user-facing interfaces for AI workflows
Integrate LLM services into web and mobile applications
Work closely onsite with product managers, designers, and data teams
Rapidly prototype, test, and iterate on AI-powered features

About HR POD - Hiring Talent Globally

At HR POD, we recruit the top 3% of global tech talent for software companies and startups, revamp teams with strategic expertise, and reskill individuals through personalized training; ensuring every hire drives innovation and growth, backed by a 91% success rate.

Serving the US, EU, KSA, UAE, and Pakistan, we go beyond placements to build future-ready, human-centered, high-performance workforces. We provide tailored solutions that meet today’s challenges while anticipating tomorrow’s needs. Let's shape the future of work together.

Industry

Consulting & Advisory

Company Size

11-50 employees

Headquarters

Hor Al Anz, AE

Year Founded

2023

Website

hr-pod.com

Social Media

Fullstack AI Engineer (Onsite, Lahore, PKR Salary)

Job Description

About HR POD - Hiring Talent Globally