Protege

Forward Deployed Data Scientist (Healthcare)

Protege  •  Remote  •  22 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
60
AI Success™

Job Description

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

As a Forward Deployed Data Scientist (Healthcare Solutions Lead) in the Healthcare vertical, you will guide prospects and customers through the definition and delivery of healthcare datasets. Your job will be to understand what customers are building, identify the data that best fits their needs, and assemble and QA high-quality samples and final deliveries that meet their technical and conceptual specs. Along the way, you’ll ensure timelines and milestones are clearly communicated from the first stages of feasibility to the final data delivery.

What You Will Own

  • Serve as the primary point of contact for customers, building long-term strategic relationships with them via collaboration around data and transparency around its delivery from Protege's network

  • Lead end-to-end program management from data specification and preparation through QA and delivery, ensuring cross-functional coordination and on-time execution

  • Work with Protege data partners to source cutting edge healthcare data into the Protege ecosystem

  • Oversee the QA, packaging, and delivery of complex datasets (EHR, claims, radiology, pathology, unstructured text), ensuring HIPAA compliance in collaboration with privacy partners

Who You Are

  • Proven customer-facing experience: skilled at managing expectations, leading customer conversations, and delivering technical outcomes with clarity and confidence

  • Bring an analyst-first mindset to challenges. You are an expert in using SQL and python to query data to construct complex patient cohorts, analyze data readiness for model training, validate clinical coverage, and support other customer-specific needs

  • FInd satisfaction by bringing order to multiple simultaneous projects and masterfully juggle competing (and sometimes changing) priorities

  • Deep expertise in various healthcare data modalities ranging from EHR, claims, radiology, pathology and unstructured text

  • Familiarity with privacy-preserving techniques of healthcare data

  • Experience in healthcare AI, ML products, or enterprise data platforms

  • Prior startup experience

  • You treat those around you with kindness

Why Protege

  • Be the connective tissue between Protege’s platform, our data, and our customers

  • Build datasets that directly power the next generation of AI models

  • Operate at the cutting edge of multimodal data — where human judgment meets machine intelligence

Protege

About Protege

The biggest unmet need in AI today is getting access to the right training data. Data holders often don’t know where to start and are rightly concerned about governance, intellectual property, and security implications. AI companies can spend years finding and negotiating access to the data they need.

Protege is solving these problems by providing an easy-to-use platform to connect data holders with vetted data users.

Industry
IT & Software
Company Size
51-200 employees
Headquarters
New York City, New York
Year Founded
2024
Social Media