Boehringer Ingelheim

Senior Data Engineer

Boehringer Ingelheim  •  London, GB (Hybrid)  •  3 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
67
AI Success™

Job Description

THE AI ACCELERATOR

The AI Accelerator is a brand-new, London-based hub, sitting within Computational Innovation (CI), which is a global organisation comprising computational biology, human genetics, data excellence and AI expertise.

The purpose of CI’s AI Accelerator is to provision production-quality, versatile, foundational biomedical AI capabilities that can be adapted and deployed to improve and accelerate portfolio decision-making and increase the probability of success, by furthering understanding of the biology driving patient outcomes and identifying mechanisms involved in disease.

A core component of the AI Accelerator is AI Enablement, a team focused on ensuring that the accelerator’s model provisioning teams can design, build and deploy versatile biomedical foundation models that can enhance human understanding of disease biology and help identify potential targets, biomarkers and patient segments for further research.

This will be achieved by provisioning AI-ready, integrated, multimodal data for distributed training, managing the model lifecycle and partnering with the IT organisation to ensure that model builders and downstream users have the necessary infrastructure and tooling to prototype, implement, adapt and deploy AI capabilities to advance the portfolio.

THE POSITION

We are seeking a Senior Data Engineer to join the AI Enablement team (@computationalinnovation) and contribute to the design and delivery of robust data engineering pipelines that transform harmonised biomedical datasets into AI-ready, integrated assets across multi-omics, clinical and health records, and medical imaging data.

You will be an experienced, independent data engineer within AI Enablement, owning significant data engineering workstreams within the broader technical direction and architecture set by the Senior Staff Data Engineer. The pipelines and integrated datasets you build will enable model training, fine-tuning and inference.

Key Responsibilities

  • Transform harmonised datasets into AI-ready assets suitable for large model pre-training and fine-tuning within the defined standards and specifications
  • Build and maintain entity linking pipelines that connect patients and biomedical entities across modalities
  • Build and maintain cross-modal integration pipelines to support multimodal training, fine-tuning and inference
  • Ensure pipelines and datasets are built and operated in accordance with data access permissions, consent conditions and usage restrictions
  • Maintain data lineage and provenance throughout
  • Build and maintain biomedical benchmark datasets with versioning and documentation
  • Write clean, well-tested, well-documented code that meets the required engineering standards
  • Contribute to code reviews within the data engineering team
  • Stay current with advances in data engineering tooling and practices relevant to biomedical AI

Required Qualifications

  • PhD in Machine Learning, Computer Science, Bioinformatics, Computational Biology or a related quantitative field
  • Strong hands-on experience in data engineering for machine learning
  • Experience working with at least one biomedical data modality in a data engineering context
  • Practical experience with entity linking or record linkage, ideally in a biomedical or clinical context
  • Strong understanding of biomedical data characteristics such as variant data formats, expression matrices, clinical coding standards such as SNOMED and ICD-10
  • Proficiency with modern data engineering tools
  • Familiarity with data governance frameworks applicable to biomedical and clinical data
  • Familiarity with Trusted Research Environments or controlled access biomedical data environments
  • Experience with biomedical ontology systems and identifier mapping across modalities
  • Contributions to open-source data engineering or bioinformatics tooling

Second round interviews will take place weeks commencing 22nd and 29th June

This is a hybrid role with approximately 3 days a week in the office

WHY THIS IS A GREAT PLACE TO WORK

Boehringer Ingelheim has been recognised as a Top Employer in the UK, demonstrating our commitment to building an exceptional workplace through strong people practices and supportive HR policies.

To learn more about why BI is a great place to work, visit:

https://www.boehringer-ingelheim.co.uk/careers/uk-careers/why-great-place-work

Boehringer Ingelheim

About Boehringer Ingelheim

Our people are our strength. And together with over 54,000 colleagues, we're creating the next breakthrough in the world of healthcare and innovation for both humans and animals. Are you ready to join us? #LifeForward

Privacy Notice: https://www.boehringer-ingelheim.com/privacy-notice

Imprint: https://www.boehringer-ingelheim.com/imprint

Industry
Chemicals & Materials
Company Size
10,000+ employees
Headquarters
Ingelheim am Rhein, DE
Year Founded
Unknown
Social Media