Job Description
The Alexa Edge AI team is seeking a talented and motivated Applied Scientist to join our newly established team in Bangalore. In this role, you will design, develop, and deploy state-of-the-art machine learning models spanning computer vision (CV), audio (including speech) processing, and multimodal semantic understanding for both edge and cloud deployment. You will work at the intersection of multiple modalities to build systems that can perceive, interpret, and reason about the world — pushing the boundaries of what's possible in unified multimodal intelligence. This is a unique opportunity to be a founding member of a brand-new site, shaping the team culture, technical direction, and research agenda from the ground up.
Key job responsibilities
Model Development: Design and build deep learning models for computer vision, audio understanding, and multimodal semantic fusion — including architectures that enable joint reasoning across visual, auditory, and textual modalities.
End-to-End Ownership: Own the full ML lifecycle — from problem formulation, data strategy, and annotation design through experimentation, evaluation frameworks, model optimization, and deployment at scale.
Research & Innovation: Stay at the frontier of CV, audio ML, and multimodal learning; identify and apply cutting-edge techniques and contribute to the scientific community through papers at top-tier venues (CVPR, NeurIPS, ICASSP, ICCV, ACL).
Mentorship & Culture Building: As a founding member of the Bangalore site, help hire, onboard, and establish the technical practices that define the team's culture.
A day in the life
An Applied Scientist with the Alexa Edge AI team will support science solution design, run experiments, research new algorithms, and find new ways of optimizing the customer experience; while setting examples for the team on good science practice and standards. Besides theoretical analysis and innovation, an Applied Scientist will also work closely with talented engineers and scientists to put algorithms and models into production.
About the team
The Alexa Edge AI team has a mission to deliver best in class, resource efficient multimodal AI models in support of various perception (vision, audio and speech) and semantic understanding based applications for devices like Echo Show series within Amazon.
Basic Qualifications
- PhD, or Master's degree and 3+ years of CS, CE, ML or related field experience
- 1+ years of building models for business application experience
- Experience programming in Java, C++, Python or related language
- Experience developing and implementing deep learning algorithms, particularly with respect to computer vision algorithms
Preferred Qualifications
- Knowledge of standard speech and machine learning techniques
- Experience with video and image processing and compression algorithms and standards, computer vision and/or machine learning
- Experience creating and delivering written and oral communications for technical and non-technical audiences
- Experience with distributed training, model compression, and inference optimization (e.g. pruning, quantization, distillation, etc.)
- Demonstrated ability to work in ambiguous, fast-paced environments and define technical roadmaps independently
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.