Business Area:
IT
Seniority Level:
Mid-Senior level
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
At Cloudera, we believe that data can make what is impossible today, possible tomorrow. We empower organizations to transform complex data into clear and actionable outcomes. Join us in our mission to harness the power of data.
We are seeking a talented and curious Senior Data Scientist to join our fast-paced, data-driven organization. In this role, you will design and deliver AI-powered systems and applications that accelerate decision-making and enhance operational excellence.
You will combine strong statistical foundations, advanced programming expertise, and modern Generative AI techniques to build scalable, production-ready solutions. This is a builder-focused role. You will move beyond analysis to develop internal copilots, AI-enabled workflows, and reusable platform components that embed intelligence directly into business processes.
Our work empowers leadership and operational teams by creating measurable, AI-enabled capabilities. We seek a thoughtful and pragmatic innovator who is enthusiastic about GenAI, disciplined experimentation, and building durable internal AI infrastructure.
To succeed in this role, you will demonstrate technical depth, intellectual curiosity, and a strong builder mindset
Data Science & Machine Learning Expertise Proficiency in Python (or R) for data preparation, feature engineering, statistical modeling, and machine learning. Experience with core data science libraries (e.g., Pandas, NumPy, scikit-learn) and a solid understanding of supervised and unsupervised learning methods.
SQL & Data Fluency Strong understanding of relational databases and the ability to quickly learn new schemas and data environments. Comfortable writing efficient, production-grade SQL to support modeling, experimentation, and AI-enabled applications.
Generative AI & LLM Engineering Hands-on experience working with large language models (LLMs) and modern AI tooling. This includes prompt design, structured output generation, retrieval-augmented generation (RAG), evaluation strategies, and workflow automation. Ability to translate GenAI capabilities into reliable, enterprise-ready solutions that integrate with existing systems and data sources.
AI Application Development Experience rapidly prototyping and iterating on internal applications, copilots, or AI-enabled workflow tools. Comfortable evolving prototypes into maintainable, production-grade solutions. Familiarity with modern development frameworks (e.g., Streamlit, Gradio, FastAPI, or similar) is beneficial.
Platform-Oriented Thinking Demonstrated ability to design reusable components such as shared prompt libraries, retrieval pipelines, evaluation frameworks, and standardized integration patterns that enable scalable AI adoption.
Strong Mathematical and Statistical Foundation Deep understanding of probability, statistical inference, experimentation, and quantitative reasoning to ensure model robustness and reliability.
Collaborative Development Experience: Experience working in collaborative environments such as Cloudera Data Science Workbench, Jupyter, Zeppelin, or similar platforms.
GitHub Proficiency: Experience using version control to support collaboration, code review, documentation, and long-term maintainability.
Exceptional Communication Skills Ability to translate complex business challenges into technical solutions and clearly communicate findings, trade-offs, and recommendations to both technical and non-technical stakeholders.
As a Senior Data Scientist, you will:
You will apply rigorous analytical thinking and modern AI capabilities to design, build, and scale high-impact solutions.
Design, develop, and deploy GenAI-powered internal applications, copilots, and workflow accelerators.
Build reusable AI components, including retrieval pipelines, structured prompting patterns, orchestration workflows, and evaluation harnesses.
Develop and maintain statistical and machine learning models to support automation, optimization, forecasting, and classification use cases.
Design retrieval strategies that connect LLMs to trusted internal knowledge sources, ensuring grounded and reliable outputs.
Implement evaluation and validation frameworks to measure quality, accuracy, and consistency of AI-driven systems.
Partner cross-functionally to identify high-value opportunities for AI enablement across the organization.
Create reusable datasets, feature pipelines, and experimentation frameworks to support iterative development.
Document methodologies, assumptions, and implementation details to ensure transparency and reproducibility.
Uphold high standards for quality, reliability, and responsible AI practices.
Contribute to peer review processes to ensure technical rigor and maintainability.
We are excited if you have (Required Experience):
5+ years of relevant experience in Data Science, Machine Learning, or AI-focused roles.
Demonstrated experience applying machine learning techniques in production or enterprise environments.
Hands-on experience building applications or workflows powered by large language models (LLMs).
Evidence of a builder mindset through shipped AI tools, internal platforms, or automation solutions.
Strong curiosity for emerging AI technologies and the ability to evaluate and adopt them responsibly.
Academic background in a quantitative discipline such as Statistics, Mathematics, Computer Science, Engineering, Economics, or a related field.
You may also have: (Preferred Qualifications)
Experience designing internal AI platforms or shared enablement frameworks.
Familiarity with API-driven architectures and integrating AI capabilities into enterprise systems.
Experience with vector databases, embedding models, or semantic retrieval systems.
Exposure to responsible AI practices, governance frameworks, or model lifecycle management.
This role is not eligible for immigration sponsorship.
What you can expect from us:
Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Employee Resource Groups
EEO/VEVRAA
#LI-MH2
#LI-REMOTE

Cloudera is the only data and AI platform company that brings AI to data anywhere: in clouds, data centers, and at the edge. Cloudera delivers 100% of data in all forms–whether it is in Cloudera or anywhere in the entire data estate. The world’s largest organizations rely on Cloudera to fuel insights that boost bottom lines, safeguard against threats, and save lives. Learn more at Cloudera.com.
---------------------------------------------------------------------------------
Recruitment Fraud Alert
It has come to our attention that job seekers have been contacted about fake job opportunities with Cloudera from individuals fraudulently posing as Cloudera employees. These recruiting fraud schemes often include requests for personal information and payments.
Be aware that Cloudera will never request a payment as part of its recruitment process. Additionally, Cloudera will never make a job offer without conducting an interview process. Any information submitted to Cloudera in relation to a job application should only be through our official career portal (https://www.cloudera.com/careers.html). Email communications from Cloudera will come from an email address ending in @cloudera.com.
If you are the target of a recruiting scam, consider filing a report with law enforcement authorities. Cloudera is not responsible for fraudulent job offers and/or any claims, damages, expenses, or other inconvenience connected to recruiting scams.