Job Description
We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.
As a Lead Software Engineer - Data Engineering at JPMorgan Chase within Global Banking Technology, you will lead the design and delivery of reliable, scalable data platforms and pipelines that power business-critical use cases, including analytics, search/retrieval, and AI-assisted workflows. You will be accountable for building high-quality curated datasets with clear contracts, lineage expectations, and measurable SLAs/SLOs, while ensuring strong controls across security, privacy, resiliency, and auditability. You will remain hands-on and will set the technical bar for engineering rigor across Java and Python implementations, including batch/stream processing, microservices, and shared libraries. You will also partner with product, UX, and platform teams to enable AI/ML and agentic patterns where they materially improve business outcomes, without compromising governance or operational discipline.
Job Responsibilities
- Design, build, and operate batch and streaming data pipelines that are reliable, observable, and cost-efficient, with clear runbooks and production support ownership.
- Lead data modeling and curation for domain datasets, including schema evolution, data contracts, lineage expectations, and consumer-facing SLAs/SLOs.
Implement robust ETL/ELT workflows with strong validation controls, including reconciliation, completeness checks, anomaly detection, and automated alerting. - Engineer high-throughput data processing solutions using a combination of Java and Python, selecting the right tool for performance, maintainability, and platform standards.
- Build and operate orchestration capabilities (for example, Airflow or equivalent), including scheduling, backfills, retries, dependency management, and operational SLAs with end-to-end ownership across architecture, engineering standards, CI/CD, and operational stability in a regulated enterprise context.
- Deliver transformation pipelines using modern transformation frameworks (for example, dbt or equivalent), with strong testing, repeatability, and release discipline.
- Develop and maintain supporting services and APIs (REST and/or gRPC) that expose curated data products and enable downstream consumers, using clean architecture and well-defined contracts.
- Build and maintain search and indexing pipelines (for example, Elasticsearch) that support discovery, retrieval, analytics, and RAG-style experiences, and establish engineering and data quality standards across code review, automated testing, performance tuning, observability (logs/metrics/traces), resiliency patterns, and incident response.
- Partner with security, risk, and controls teams to ensure data solutions meet governance expectations, including access controls, secrets handling, least privilege, and auditability.
- Enable secondary AI/ML capabilities by delivering the data foundations required for evaluation, guardrails, tool/function integration, and traceable AI-assisted workflows, including MCP-style integration patterns where applicable.
- Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team.
- Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Hands-on engineering experience delivering production-grade platforms and data systems, with demonstrated recent experience as a lead Data Engineer building and operating curated datasets and production pipelines end-to-end.
- Strong hands-on proficiency in both Java and Python in production environments, including performance-minded development, design patterns, and maintainable codebases.
- Advanced SQL skills, with strong capability in data modeling, schema design, and schema evolution. with proven experience with pipeline orchestration (for example, Airflow or equivalent), including operational controls (SLAs, alerting, retries, backfills).
- Proven experience with transformation frameworks (for example, dbt or equivalent) and strong testing practices for transformations and data quality.
- Experience processing large-scale datasets with a clear track record of optimizing for performance, scalability, reliability, and cost.
- Experience designing and implementing large-scale batch processing jobs (for example, Spring Batch or equivalent enterprise batch frameworks).
- Hands-on experience building and operating search/indexing workflows (for example, Elasticsearch) at scale with strong SDLC discipline: code reviews, unit/integration testing, CI/CD, release hygiene, and production support ownership.
- Secure engineering fundamentals: authentication/authorization, secrets management, least privilege, secure coding, and policy enforcement patterns (including familiarity with OPA or similar policy-as-code approaches) with strong communication and cross-functional leadership across engineering, product, UX, platform, and control partners.
- Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security.
- Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practices
Preferred qualifications, capabilities, and skills
- Experience with event streaming and asynchronous architectures (for example, Kafka) and event-driven processing patterns.
- Experience with large-scale processing engines (for example, Spark or equivalent) and distributed compute cost governance.
- Cloud-native delivery experience (for example, AWS), including containers and Kubernetes, with strong operational excellence practices.
- Experience building LLM/GenAI-enabled applications, including RAG patterns, evaluation approaches, and safety controls, with a disciplined approach to governance and traceability.
- Familiarity with agentic architectures, including orchestrators, tool/function integrations, workflow/state management, and MCP-style integration concepts.
- Experience delivering in regulated environments with strong risk, control, and audit requirements.