Job Description
We're opening eyes, hearts and minds to the impact that a pharmacy team can have in changing lives.
Join our group of talented, committed team members-pharmacists, pharmacy care coordinators, technologists, product strategists and more-to create and expand the delivery of personalized health support that people didn't even know could be possible.
The Senior Data Engineer for Stellus Rx will be a key member of our Technology Team, working closely with Stellus Rx leaders and across the organization to unlock the health of millions of Americans. We are a culture that is unabashedly driven by purpose — making a difference to patients and team members while growing at an accelerated rate.
This role is built for a data engineer who uses AI as an active part of their workflow — accelerating pipeline development, automating data quality processes, and enabling richer, faster insights across our Cloud Analytics Data Platform rather than relying on manual, repetitive engineering approaches.
Role and Responsibilities:
AI-Augmented Pipeline Development & Automation
- Develop, construct, and maintain large-scale data processing systems that collect data from a variety of structured and unstructured sources — using AI code generation tools to accelerate pipeline authoring, reduce boilerplate, and improve code quality.
- Build and optimize ELT pipelines using AI-assisted tooling to identify bottlenecks, suggest optimizations, and automate routine pipeline maintenance tasks.
- Identify, design, and implement internal process improvements: use AI to automate manual processes, optimize data delivery, and re-design infrastructure for greater scalability — replacing manual analysis with AI-driven discovery of improvement opportunities.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from various sources; use AI to accelerate infrastructure-as-code authoring and configuration.
AI-Ready Data Preparation & ML Enablement
- Prepare data for data scientist exploration and discovery using AI-assisted data profiling and quality assessment tools — surfacing anomalies, schema drift, and data gaps faster than manual inspection allows.
- Perform data wrangling and munging for downstream analytics and machine learning; leverage AI tools to generate and validate transformation logic against business rules.
- Assemble large, complex datasets that meet functional and non-functional business requirements; use AI to rapidly evaluate dimensional modeling approaches and ontology alignment strategies.
- Enable large-scale machine learning by designing and maintaining annotated datasets, elastic search approaches, and scalable data lake structures that support AI/ML workloads.
Analytics Pipeline & Insight Generation
- Create and maintain analytics pipelines that generate data and insight to power business decision-making; use AI-assisted analysis to proactively surface trends, anomalies, and opportunities within pipeline outputs.
- Collaborate with data scientists, analysts, and business stakeholders on requirements for dimensional modeling, distributed ETL pipelines, and cross-repository data migration.
- Evaluate, compare, and improve design patterns, data lifecycle approaches, and data ontology alignment — using AI to model trade-offs and accelerate proof-of-concept validation.
- Work with data and analytics experts to continuously improve the functionality, reliability, and intelligence of data systems.
Root Cause Analysis & Quality Management
- Perform root cause analysis on internal and external data and processes using AI-assisted investigation tools — replacing slow, manual log and lineage review with faster, AI-accelerated diagnostics.
- Develop and maintain data quality frameworks; use AI to automate anomaly detection, schema validation, and data contract enforcement across pipelines.
- Develop a strong understanding of company domains, strategic direction, and user needs to ensure data systems are aligned to business outcomes, not just technical requirements.
Qualifications and Requirements:
- 4+ years of experience in a Data Engineer role.
- Graduate degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative field.
- Advanced SQL knowledge and experience with relational databases and query authoring.
- Required: Demonstrated, hands-on experience using AI tools to accelerate data engineering tasks — pipeline development, data quality automation, code generation, or root cause analysis — with specific examples you can speak to.
- Experience building and optimizing data pipelines, architectures, and datasets.
- Strong analytic skills working with unstructured and disconnected datasets.
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational and NoSQL databases including Postgres and Cassandra.
- Experience with pipeline and workflow management tools: Airflow, Luigi, Azkaban, or similar.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift.
- Experience with stream-processing systems: Storm, Spark Streaming, or similar.
- Working knowledge of message queuing, stream processing, and highly scalable data stores.
- Proficiency in object-oriented/scripting languages: Python, Java, Scala, C++, or similar.
- Experience supporting cross-functional teams in dynamic, agile environments.
Preferred Experience:
- Experience designing or supporting data infrastructure for AI/ML model training, including annotated datasets and feature stores.
- Familiarity with AI-assisted data quality or observability platforms (e.g., Monte Carlo, Soda, or similar).
- Experience with LLM-based data processing pipelines or retrieval-augmented generation (RAG) architectures.
- Healthcare data experience; familiarity with FHIR/HL7 standards a plus.
- High English proficiency