University of Pennsylvania

Postdoctoral Researcher in AI-Driven Data Curation and Data Integration

University of Pennsylvania  •  United States (Onsite)  •  5 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Faculty Mentor: Joost Wagenaar
Department: Informatics
Number of Positions: 2
Open to applications from US Citizens and foreign nationals.

The Wagenaar Lab is seeking a highly motivated Postdoctoral Researcher to conduct research at the intersection of artificial intelligence, common data elements (CDEs), and large-scale biomedical datasets. The Wagenaar Lab is jointly based in the Institute for Biomedical Informatics and the Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania, and leads the academic development of the Pennsieve scientific data platform. The lab’s mission is to create scalable, sustainable infrastructure that enables data integration, reuse, and discovery across clinical and scientific research domains.

This postdoctoral position will focus on developing AI-enabled methods to automate and augment data curation, with an emphasis on leveraging CDEs to improve the usability, interoperability, and scientific value of public datasets. The successful candidate will work across disease areas—including Epilepsy, Immune Health, and programs within the NIH HEAL Initiative—to design approaches that harmonize heterogeneous datasets, enrich metadata, and support scalable data exploration.

The Postdoctoral Researcher will work closely with the Pennsieve development team and a broad network of scientific collaborators to translate industry best practices in data engineering and AI into the academic research ecosystem. A central goal of this role is to move beyond manual, project-specific curation toward reproducible, automated, and extensible curation workflows that can be applied across datasets, programs, and institutions.

In addition to platform and method development, the Postdoctoral Researcher is expected to contribute to peer-reviewed publications, open-source software, and community-facing resources that advance AI-enabled data stewardship and reuse.

Responsibilities

  • Develop AI-based methods and deploy them at scale to automate and augment data curation using Common Data Elements
  • Design workflows to harmonize, validate, and enrich public datasets across Epilepsy, Immune Health, and NIH HEAL programs
  • Develop novel mechanisms to interrogate, visualize and interact with complex scientific datasets and increase the value of these datasets for the scientific community.
  • Integrate curation methods into scalable, cloud-based scientific data platforms
  • Collaborate with the Pennsieve development team and scientific partners to align methods with real research workflows
  • Evaluate and validate curation approaches using large, heterogeneous public datasets
  • Prepare manuscripts, technical documentation, and presentations describing methods and outcomes.

Qualifications

Qualifications

  • Ph.D. (preferred) or Master’s degree in Biomedical Informatics, Computer Science, Data Science, Bioinformatics, or a related field
  • Experience with machine learning, natural language processing, or AI applied to structured and unstructured data
  • Familiarity with Common Data Elements, data standards, or ontology-based data representation (preferred)
  • Strong programming skills in Docker, Python, Go, Java, or related languages
  • Experience working with large-scale biomedical or clinical datasets
  • Experience with cloud-based data processing and scalable analytics environments (AWS preferred)
  • Strong written and verbal communication skills and an interest in interdisciplinary collaboration

Application Instructions

Please include a cover letter and a CV for consideration.

Equal Employment Opportunity Statement

The University of Pennsylvania is an equal opportunity employer. Candidates are considered for employment without regard to race, color, sex, sexual orientation, religion, creed, national origin (including shared ancestry or ethnic characteristics), citizenship status, age, disability, veteran status or any class protected under applicable federal, state, or local law.

University of Pennsylvania

About University of Pennsylvania

Industry
Unknown
Company Size
Unknown
Headquarters
Unknown
Year Founded
Unknown
Website
upenn.edu
Social Media