
Project Overview
Join a cutting-edge initiative focused on building advanced AI voice infrastructure for Arabic-speaking markets. The project involves developing state-of-the-art Arabic speech technologies, including:
The solutions are tailored to regional Arabic dialects, including Egyptian, Gulf, Levantine, and others.
We are seeking a highly skilled Senior Applied Machine Learning Engineer with deep expertise in speech and audio technologies. In this role, you will design, fine-tune, and optimize advanced machine learning models for Arabic voice applications. You will work across the full development lifecycle, from data pipeline construction and model experimentation to inference optimization and production deployment.
This position is ideal for engineers who are passionate about transforming cutting-edge research into scalable, low-latency systems that support natural and accurate Arabic speech interactions.
Key Responsibilities
Benchmark and evaluate TTS and ASR models using Arabic-specific test sets, measuring metrics such as Word Error Rate (WER), naturalness, and dialect coverage.
Fine-tune generative models for voice cloning, zero-shot speaker adaptation, and speech synthesis.
Build and maintain Arabic-focused data pipelines, including:
Audio collection and preprocessing
Diacritization (Tashkil)
Data cleaning and augmentation
Optimize model inference for production environments using:
Quantization
KV-cache tuning
Streaming inference techniques
Integrate and evaluate complete speech-to-speech conversational pipelines.
Conduct experiments based on recent research papers and convert findings into production-ready solutions.
Collaborate with engineering and product teams to deploy robust and scalable speech systems.
Required Qualifications
5+ years of experience in Machine Learning, Applied AI, or AI Research.
Strong programming skills in Python.
Extensive hands-on experience with PyTorch and the Hugging Face ecosystem.
Proven experience training and fine-tuning neural models for:
Text-to-Speech (TTS)
Automatic Speech Recognition (ASR)
Audio codecs
Deep understanding of modern speech architectures such as:
Whisper
Conformer
HiFi-GAN
Diffusion-based models
Experience with audio processing techniques including:
Voice Activity Detection (VAD)
Speaker Diarization
Neural Vocoders
Demonstrated ability to implement and adapt research papers into practical production experiments.
Strong understanding of Arabic language challenges, including:
Diacritization (Tashkil)
Dialectal variations
Code-switching
Experience with inference optimization techniques such as:
Quantization
Streaming inference
NVIDIA TensorRT
Preferred Qualifications

Nile Bits aims to provide the best software development services that deliver robust, scalable, and cost effective software solutions. A team of top class professionals offers you proven expertise to ensure the quality and reliability of the products we develop for you. We emphasize meeting the unique business needs of our clients.
If you would like your work done by outsource development to be more competitive in pricing but without sacrificing the quality, we can help you!
For developers, join us now!
https://www.nilebits.com/careers/