Middesk

Software Engineer, Data Platform

Middesk  •  $205k - $275k/yr  •  San Francisco, CA (Onsite)  •  3 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

About Middesk

Middesk makes it easier for businesses to work together. Since 2018, we’ve been transforming business identity verification, replacing slow, manual processes with seamless access to complete, up-to-date data. Our platform helps companies across industries confidently verify business identities, onboard customers faster, and reduce risk at every stage of the customer lifecycle.

Middesk came out of Y Combinator, is backed by Sequoia Capital and Accel Partners, and was recently named to Forbes Fintech 50 List and cited as an industry leader in business verification by digital identity strategy firm, Liminal.

About Middesk Engineering:

Middesk is, at its core, a data company. We live by the quality of our data assets and the engine that powers them. We’re on a mission to build a comprehensive and complete business dataset for every business in the world. As part of the Data Platform team at Middesk, you’ll collaborate with Data Science, Infrastructure and Product Engineering teams to build and maintain our own proprietary Entity Resolution system used to power the Middesk business identity platform, scaling our system to resolve millions of business identities across hundreds of data sources and thousands of distinct data sets. You’ll often work with and support product engineers looking to launch new products and features.

The Role:

We're looking for a senior engineer to own and drive the technical direction of our data platform. You'll design data infrastructure systems that operate reliably at scale, define the architecture for how we acquire, transform, and serve data to product teams, and partner closely with engineering leadership to align platform strategy with company priorities.

You'll be expected to independently scope complex projects from ambiguous problem statements, break them into incremental deliverables, and drive them to completion while keeping stakeholders informed and adapting when priorities shift. You'll also play a key role in elevating the engineering practices of the team through mentorship, code review, and setting technical standards.

What You'll Do:

  • Own platform architecture and technical direction for how we ingest, transform, and serve data across highly variable input formats - business entity data sourced from thousands of government agencies, registries, and third-party providers, each with its own schema, cadence, and reliability profile

  • Design and build systems for scale - both the infrastructure we need today and the infrastructure we'll need at 2–5x our current volume

  • Scope and drive complex projects end-to-end, breaking ambiguous problems into well-defined milestones with clear deliverables and timelines

  • Design AI-powered tooling to improve how we acquire and maintain data using LLMs, AI agents, and agent orchestration

  • Partner with product engineering, data science, and business teams to understand data needs and translate them into platform capabilities

  • Establish and maintain data governance and quality standards across the platform, ensuring the integrity and reliability of the data our customers depend on for compliance and risk decisions

What We’re Looking For:

  • 7+ years of professional software engineering experience, with meaningful time spent on data infrastructure, data engineering, or backend platform work (targeting Senior to Staff Level Engineers)

  • Experience designing and operating systems at meaningful scale, ideally within a larger or rapidly scaling engineering organization

  • Track record of independently owning and delivering complex, multi-milestone projects - from scoping through launch

  • Strong data modeling instincts and deep familiarity with SQL, pipeline orchestration (Airflow, Dagster, etc.), and data transformation patterns

  • Experience with distributed data processing frameworks (Spark, Flink, Beam, or similar) and an understanding of when and how to apply parallelization to scale pipelines beyond single-node limits

  • Proficiency in one or more of Python, Ruby, JavaScript/TypeScript, Java

Nice to Haves:

  • Experience working with or building AI/LLM-powered tooling or data products (strongly preferred)

  • Experience with scraper technologies, including agentic AI

  • Experience building and designing collections stored on Elasticsearch

  • Experience operating event-driven data pipelines using serverless compute (e.g., AWS Lambda, Google Cloud Functions, Azure Functions) and managed cloud services

  • Experience with Terraform, Datadog, or Kubernetes

Middesk

About Middesk

Middesk's mission is to enable every business to access the products and services they need to grow and thrive. We believe that if we can make it easy for a business to access financial products, hire new employees, and transact with other businesses, that we increase the odds of success for that business to contribute to its community and the broader economy.

Our Identity product provides accurate, complete information that financial services companies need to make efficient onboarding decisions. Our Agent product makes it easy for employers to file with the state and federal agencies needed to establish their business across the country. Our customers include Affirm, Brex, Plaid, Mercury, Divvy, Rippling, Gusto, and others.

Based in San Francisco, CA, Middesk is backed by Sequoia Capital, Accel Partners, and Y Combinator.

Industry
IT & Software
Company Size
51-200 employees
Headquarters
San Francisco, California
Year Founded
2019
Social Media