apna

Lead / Staff Data Engineer - Data Platform

apna  •  Bengaluru, IN (Onsite)  •  3 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Company: Apna

Team: Data Platform / Engineering

Location: Bangalore

Experience 5-7 Years of Experience

Why Join Apna

At Apna, data is central to how we build products, understand users, improve employer outcomes, power recommendations, and scale decision-making. This role gives you the opportunity to build the backbone of Apna’s data platform and influence how data is used across the company.

You will work on real-world, high-scale problems across jobs, users, employers, communities, matching, growth, and AI-driven systems.

About the Role

Apna is looking for a Lead / Staff Data Engineer to build and scale our core data platform. This role will work on large-scale data pipelines, lakehouse architecture, query platforms, workflow orchestration, and data reliability systems that power analytics, product intelligence, machine learning, business dashboards, experimentation, and operational decision-making across Apna.

We are looking for someone who can think deeply about data architecture, design reliable pipelines, improve data quality, and help build a platform that can scale with Apna’s growth.

What You’ll Own:

You will be responsible for designing, building, and operating critical parts of Apna’s data platform, including:

  • Building scalable batch and near-real-time data pipelines across product, business, growth, and ML use cases.
  • Designing and improving our lakehouse architecture using technologies like Apache Hudi
  • Working with query engines such as Presto / Trinofor large-scale analytical workloads.
  • Building and maintaining orchestration workflows using Apache Airflow
  • Creating reusable data models, curated datasets, and reliable data marts for analytics and product teams.
  • Improving data platform reliability, observability, SLA tracking, lineage, and data quality checks.
  • Optimizing storage, compute, query performance, and pipeline costs.
  • Partnering with product, analytics, ML, and backend engineering teams to understand data needs and convert them into scalable platform solutions.
  • Driving engineering standards around data modeling, schema evolution, partitioning, deduplication, backfills, replayability, and pipeline ownership.
  • Mentoring data engineers and influencing architecture decisions across teams.

What We’re Looking For

Must Have

  • Strong experience in data engineering, preferably at scale.
  • Hands-on experience with Apache Airflowor similar orchestration systems.
  • Strong knowledge of Presto / Trinoor other distributed query engines.
  • Good understanding of Apache Hudiconcepts such as:
    • Copy-on-write vs merge-on-read
    • Upserts and deletes
    • Incremental reads
    • Compaction
    • Clustering
    • Timeline and commits
    • Schema evolution
    • Partitioning strategy
  • Strong knowledge of distributed data processing and storage systems.
  • Ability to design and build reliable ETL / ELT pipelines.
  • Strong SQL skills and ability to debug complex data issues.
  • Good understanding of different data architectures, including:
    • Data warehouse
    • Data lake
    • Lakehouse
    • Lambda architecture
    • Kappa architecture
    • Medallion architecture
    • Event-driven data architecture
  • Experience with data modeling for analytics and reporting.
  • Strong programming skills in at least one language such as Python, Java, or Scala
  • Ability to reason about trade-offs between freshness, cost, reliability, latency, and complexity.
  • Strong debugging and production ownership mindset.

Good to Have

  • Experience with Kafka, Spark, Flink, Hive, Iceberg, Delta Lake, or BigQuery.
  • Experience building internal data platforms or self-serve data infrastructure.
  • Experience with data quality frameworks such as Great Expectations, Deequ, Soda, or custom validation systems.
  • Exposure to ML feature pipelines or feature stores.
  • Experience with metadata management, data catalogs, lineage, and governance.
  • Experience with cloud infrastructure such as AWS, GCP, or Azure.
  • Understanding of privacy, compliance, PII handling, and access control in data systems.

What Success Looks Like
In this role, success means:

  • Critical business and product datasets are reliable, discoverable, and trusted.
  • Pipelines are observable, recoverable, and have clear SLAs.
  • Query performance improves across major analytical workloads.
  • Data freshness and quality issues reduce significantly.
  • Teams can build on top of the data platform faster without reinventing pipelines.
  • The platform can scale with Apna’s user, job, employer, and engagement data.
apna

About apna

Founded in 2019, Apna Group is redefining the future of work for India and beyond - empowering millions of professionals and enterprises through AI-led innovation.

Through Apna.co, India’s largest early-career job marketplace, we’ve connected 6 Cr+ job seekers with 7 Lakh+ employers across 900+ cities, enabling faster, smarter, and more meaningful hiring at scale.

Trusted by India’s leading enterprises such as Teleperformance, Zomato, HDB Financial Services, Bluestar, TVS, Kotak, Axis Bank, Flipkart, and Lifestyle, Apna powers workforce transformation across Retail, BFSI, Staffing, Healthcare, Manufacturing, and IT sectors.

Building on this foundation, Apna has expanded into enterprise AI innovation with Blue Machines, our Voice AI platform that enables organizations to deploy production-grade voice agents with sub-300 ms latency and <1-week deployment cycles. In its first 45 days, Blue Machines secured $6 M+ in enterprise contracts across lending, insurance, recruitment, and healthcare - making it one of India’s fastest-adopted deep-tech platforms.

Explore more at bluemachines.ai

Recognized among India’s Most Preferred Workplaces 2025–26, and as a Most Preferred Workplace for Women, Apna Group continues to build technology that empowers people, strengthens enterprises, and drives inclusive growth.

Backed by world-class investors including Tiger Global, Sequoia Capital, Lightspeed, Insight Partners, GSV Ventures, and Owl Ventures, Apna collaborates with leading government and public institutions such as the NSDC, Ministry of Defence, UNICEF YuWaah, and AICTE to drive nationwide skilling and employability programs - powering AICTE’s career portal for over 3 million students across 22,000 colleges.

Visit: apna.co

For employer solutions, visit: employer.apna.co

Industry
IT & Software
Company Size
1,001-5,000 employees
Headquarters
Bengaluru, IN
Year Founded
2019
Website
apna.co
Social Media