Job Description

Company Background

Our client is a technology startup building advanced voice automation solutions for the quick-service restaurant industry. The company develops a privacy-conscious, high-performance drive-thru voice assistant that automates real-time customer interactions, improves order accuracy, and helps restaurant chains increase revenue and operational efficiency. Its product is designed for fast deployment in noisy, high-volume drive-thru environments and is already gaining market traction through cooperation with a major restaurant chain.

Project Description

The project is a next-generation voice automation engine for drive-thru order-taking in quick-service restaurants. It enables fully automated, real-time conversations between customers and the restaurant ordering system using modern speech recognition, natural language processing, and text-to-speech technologies.

The engineer will work on improving the core voice and audio capabilities of the platform, including Speech-to-Text, Text-to-Speech, noise cancellation, speech enhancement, and real-time audio pipelines. The main focus will be on reducing latency, improving recognition quality and speech clarity, and making the system robust in extremely noisy real-world environments such as drive-thrus.

Technologies

Speech-to-Text, Text-to-Speech
Audio Engineering / DSP
Noise Suppression, Voice Activity Detection, Signal Processing
PyTorch / TensorFlow
Real-Time Inference, Streaming Pipelines
GPU Optimization, Edge Inference
Production ML Systems

What You'll Do

Optimize low-latency, real-time Speech-to-Text pipelines for production drive-thru environments;
Improve Text-to-Speech naturalness, responsiveness, and overall conversational quality;
Design, tune, and improve noise suppression, echo cancellation, and speech enhancement systems;
Improve speech recognition accuracy and robustness under challenging acoustic conditions, including engine noise, weather, overlapping speech, poor microphone quality, and outdoor environments;
Build and scale audio processing infrastructure for production deployments;
Evaluate, benchmark, and compare speech models using real-world audio data and production scenarios;
Experiment with modern Speech AI technologies, models, and architectures to improve system performance;
Collaborate with LLM and conversational AI teams to improve end-to-end voice interaction quality;

Job Requirements

Advanced Python development skills;
Deep hands-on expertise with Speech-to-Text and Text-to-Speech systems;
Proven experience improving speech recognition quality in noisy or otherwise challenging acoustic environments;
Strong expertise in noise suppression, echo cancellation, voice activity detection, and speech enhancement;
Strong understanding of real-time and streaming audio architectures, including conversational voice pipelines and real-time inference;
Experience building low-latency, production-grade AI systems;
Experience with modern speech AI frameworks, models, and APIs;
Experience deploying and scaling AI services in cloud environments;
Ability to troubleshoot complex audio quality, latency, and reliability issues;
Product-oriented mindset with a focus on real-world performance, customer experience, and high ownership;
Ability to collaborate effectively with engineering, LLM, and conversational AI teams;
English level: B2 or higher;

What Do We Offer

The global benefits package includes:

Technical and non-technical training for professional and personal growth;
Internal conferences and meetups to learn from industry experts;
Support and mentorship from an experienced employee to help you professional grow and development;
Health insurance;
English courses;
Sports activities to promote a healthy lifestyle;
Flexible work options, including remote and hybrid opportunities;
Referral program for bringing in new talent;
Work anniversary program and additional vacation days.

About Coherent Solutions

Coherent Solutions is a digital product engineering company focused on empowering business success. Our global team of talented professionals seamlessly collaborate to deliver innovative solutions that drive measurable business impact. As a preferred digital product engineering partner and advisor, we leverage our expertise in software product development to help clients thrive in the digital age.

•‎ 30 years in business

•‎ 1000+ projects completed

•‎ 95% client retention

•‎ 10 global locations

•‎ 1700+ global employees

Let’s discuss your project! Schedule a consultation on https://www.coherentsolutions.com.

Coherent Solutions. Empowering Business Success.

Industry

IT & Software

Company Size

1,001-5,000 employees

Headquarters

St Louis Park, Minnesota

Year Founded

1995

Website

coherentsolutions.lt

Social Media

AI/ML Developer (Speech AI)