Job location: Remote in India About the role: We are looking for a seasoned Machine Learning Engineer with deep expertise in deep learning to design and build production-grade models in the speech/audio and computer vision domains. The Machine Learning Engineer will own the end-to-end model development lifecycle - from dataset curation and architecture design through training, optimization, and deployment, and will work closely with software and product teams to ship low-latency, scalable AI features.
What you will be expected to do
Research, design, train, and productionize deep learning models for speech/audio (ASR, speaker diarization, audio classification, TTS, noise suppression) and computer vision (detection, segmentation, classification, video understanding) use cases.
Architect training pipelines capable of handling large-scale datasets, manage data preprocessing, augmentation, and versioning workflows.
Select and adapt state-of-the-art architectures (Transformers, CNNs, RNNs, diffusion models, etc.) and fine-tune or distill pre-trained models for production constraints.
Optimize models for inference - quantization, pruning, knowledge distillation - targeting latency, throughput, and memory budgets on CPU/GPU/edge hardware.
Work with software engineers to integrate models into production systems, design APIs and microservices for model serving.
Define and track evaluation benchmarks, monitor model performance in production, and drive continuous improvement cycles.
Stay current with research literature, evaluate and implement relevant SOTA techniques, contribute to internal technical forums.
You might be a strong candidate if you have/are
B.Tech / M.Tech / M.Sc. / PhD in Computer Science, Electrical Engineering, Signal Processing, or a related field.
4+ years of experience in machine learning/deep learning engineering with significant hands-on work in speech/audio and/or computer vision in production environments.
PyTorch (primary) and TensorFlow/Keras, familiarity with JAX is a plus.
Deep expertise in Wav2Vec 2.0, Whisper, ESPnet, SpeechBrain, torchaudio, librosa.
Strong experience with speech/audio frameworks and toolkits: YOLO, DETR, ViT, EfficientNet, SAM, and standard CV pipelines (OpenCV, torchvision, Albumentations).
Solid background in computer vision: TensorRT, ONNX Runtime, TorchScript, DeepSpeed, or Triton Inference Server.
Hands-on experience with model inference optimization using Python and strong software engineering fundamentals: OOP, clean code, testing, CI/CD.
Proficient in SQL for data extraction and pipeline logging.
Experience with distributed training frameworks (DDP, DeepSpeed, FSDP) and large-scale data pipelines.
Familiarity with cloud ML platforms (AWS SageMaker, GCP Vertex AI, Azure ML) and containerized deployment (Docker, Kubernetes, ECS).
Good knowledge of system design principles for ML services: throughput, latency, scalability, fault tolerance.
What Sun King offers
Professional growth in a dynamic, rapidly expanding, high-social-impact industry
An open-minded, collaborative culture made up of enthusiastic colleagues who are driven by the challenge of innovation towards profound impact on people and the planet.
A truly multicultural experience: You will have the chance to work with and learn from people from different geographies, nationalities, and backgrounds.
Structured, tailored learning and development programs that help you become a better leader, manager, and professional through the Sun King Center for Leadership.
About Greenlight Planet
Powering access to brighter lives in Africa, Asia, and beyond