Job Description

About Wati

Started as a WhatsApp team inbox in 2020, Wati has evolved into a full revenue orchestration system that goes beyond a single platform. We empower businesses that sell, support, and grow through conversations by observing customer intent in real-time, deciding the next best revenue action, and executing it seamlessly across marketing, sales, and support—all within WhatsApp and connected messaging channels.

Our Platform & AI Capabilities

Wati is designed for scalability and intelligence. Our AI-native platform simplifies complex customer communication operations through a unified inbox, a robust multi-channel messaging infrastructure, and no-code automation. At the heart of our solution is Astra, our intelligent AI layer, which helps you create AI Agents for all customer interactions and all your messaging platforms. By integrating AI agents into the ecosystem, we enable businesses of all sizes to deliver measurable ROI and build deeper customer relationships.

Our Backing & Partnerships

Trusted by over 16,000 customers across 190+ countries, Wati is proudly backed by world-class investors including Tiger Global, Sequoia Capital, DST Global, and Shopify. As a Premium-tier Partner of Meta and Google, we maintain the highest standards of platform excellence and integration.

About the Role

We are looking for an Agentic Engineer – Voice AI to build and scale Wati's real-time voice AI capabilities on WhatsApp.

In this role, you will develop the systems that allow AI agents to listen, think, and speak in real time over WhatsApp voice calls. This includes building and optimizing the real-time media pipeline (WebRTC, LiveKit), integrating frontier AI models (OpenAI Realtime API, Google Gemini Live), and engineering the cascade architecture that connects speech recognition, language models, and speech synthesis into a seamless conversational experience.

You will work on latency-critical infrastructure where every millisecond matters — from audio transport and voice activity detection to model inference and text-to-speech delivery. You will also contribute to the broader AI agent stack, including tool calling, context management, and multi-turn conversation orchestration.

This role sits at the intersection of real-time communication systems, AI model integration, and conversational voice experiences.

What You Will Do

• Design, build, and optimize real-time voice AI pipelines — from WebRTC media transport to LLM inference and speech synthesis

• Integrate and orchestrate frontier AI models including OpenAI Realtime API, Google Gemini multimodal live, and cascade architectures (ASR → LLM → TTS)

• Build and maintain the media infrastructure: LiveKit-based audio routing, Opus codec handling, RTP/RTCP transport, and voice activity detection

• Develop agent capabilities for voice interactions — tool calling, function execution, context engineering, and multi-turn conversation management

• Optimize end-to-end latency across the voice pipeline, from audio capture to AI response playback

• Collaborate with product and platform teams to deliver production-grade voice AI experiences on WhatsApp

• Ensure reliability, performance, and scalability of voice AI infrastructure serving customers across 190+ countries

Requirements

• 3+ years of software engineering experience, with strong backend development skills (Go or Python preferred)

• Experience with real-time communication technologies: WebRTC, RTP/RTCP, audio codecs, or media server infrastructure

• Familiarity with AI/LLM integration — model APIs, tool calling, prompt engineering, or agent orchestration

• Experience with or strong interest in speech technologies: ASR, TTS, voice activity detection, or audio processing pipelines

• Understanding of distributed systems, microservices, and cloud-native architectures (GCP preferred)

• Comfortable working with PostgreSQL, Redis, and pub/sub messaging systems

• Strong problem-solving ability and ability to work in fast-paced, ambiguity-rich environments

• Ability to debug complex, latency-sensitive systems by reading code, traces, and real-time metrics

Nice to Have

• Hands-on experience with OpenAI Realtime API, Gemini multimodal live, or similar real-time AI model APIs

• Experience with AI agent frameworks (Dify, LangChain, CrewAI, etc.)

• Familiarity with MCP (Model Context Protocol) or other agent integration standards

• Contributions to open-source projects in the AI or real-time communication space

Behavioural Expectations

• Strong ownership and bias for action in a fast-moving environment

• Proactive, self-driven, and comfortable working across teams to drive outcomes

• AI-native mindset, actively using AI tools in daily engineering workflow and experimenting with agent frameworks and emerging technologies

• Curiosity-driven — eager to push the boundaries of what voice AI can do in real business contexts

About Wati

Reimagining customer engagement, Wati is the leading conversational platform built on WhatsApp's Business API. Our easy-to-use software empowers 10,000+ businesses across 160+ countries to deliver personalised, real-time conversations at scale.

With innovative AI solutions, we're transforming how companies communicate:

Shared inboxes allow seamless collaboration

Powerful automation boosts efficiency

Broadcast messaging engages customers

Intelligent chatbots provide instant support

As a fast-growing global SaaS startup, we're passionate about using technology to build meaningful relationships between businesses and customers. Our talented, driven team is united by a vision to empower organizations and redefine connections through meaningful conversations.

Industry

IT & Software

Company Size

201-500 employees

Headquarters

Hong Kong, HK

Year Founded

2019

Website

wati.io

Social Media

Agentic Engineer - Voice AI

Job Description

About the Role

About Wati