Zensar Technologies

Data Engineers (Big Data Hadoop, Scala, Spark, Ozone/Iceberg/Airflow)

Zensar Technologies  •  Republic of India (Hybrid)  •  2 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

To excel as a Data Engineer specialized in this modern, high-performance big data stack, you need to master a specific blend of distributed computing, modern storage architectures, functional programming, and workflow orchestration.

1. Functional Programming & Apache Spark
  • Scala Core Mastery You must understand functional programming paradigms, immutable data structures, pattern matching, and implicit parameters.

  • Spark Core & Architecture Deep knowledge of the internal workings of Apache Spark, including the Catalyst Optimizer, Tungsten execution engine, lazy evaluation, Directed Acyclic Graphs (DAGs), and memory management (execution vs. storage memory).

  • Performance Tuning Ability to identify and resolve performance bottlenecks like data skew, handling OOM (Out Of Memory) errors, optimizing joins (Broadcast vs. Sort-Merge), managing partition sizes, and avoiding expensive shuffle operations.

  • Structured APIs & Streaming Proficiency in Spark DataFrames/Datasets APIs and Spark Structured Streaming for low-latency, real-time data processing.

2. Next-Generation Storage & Table Formats
  • Apache Iceberg Expertise in implementing Iceberg as your open table format over a data lake. You must master features like ACID transactions, time travel, schema evolution (hidden partitioning), and row-level updates/deletes.

  • Apache Ozone Understanding Ozone as a scalable, redundant, and distributed object store designed specifically for Hadoop environments. You should know how it replaces or coexists with HDFS to handle billions of small and large files efficiently.

  • Storage Optimization Skills in managing data compaction (merging small files), snapshot isolation, and choosing optimal file formats like Parquet, ORC, or Avro.

3. The Hadoop Ecosystem Foundation
  • HDFS & YARN While industry focus is shifting toward object storage, you still need a strong understanding of HDFS architecture (NameNode, DataNode) and YARN resource management (Resource Manager, Node Manager) to debug legacy systems or manage hybrid environments.

  • Hive & Metastore Management Ability to manage catalog metadata and run distributed SQL queries over your distributed storage system.

4. Workflow Orchestration
  • Apache Airflow Mastery of building, scheduling, and monitoring complex data pipelines using Python-based DAGs.

  • Advanced Airflow Concepts Utilizing TaskFlow API, custom XComs, dynamic task mapping, and setting up efficient Task Groups.

  • Orchestration Integration Knowing how to safely trigger, monitor, and pass parameters to external Spark jobs or Cloud/Databricks operators) without overloading the Airflow worker nodes.

5. Architectural & Cross-Functional Skills
  • Data Lakehouse Architecture Designing unified platforms that combine the cost-effective storage of data lakes with the data management structures of data warehouses.

  • CI/CD & DataOps Writing clean, testable Scala/Python code using unit-testing frameworks (like ScalaTest) and automating deployments using Git, Docker, and CI/CD pipelines.

  • Advanced SQL Writing complex query logic, analytical window functions, and diagnosing execution plans—even when writing Spark code, SQL remains foundational.

1. Functional Programming & Apache Spark
  • Scala Core Mastery You must understand functional programming paradigms, immutable data structures, pattern matching, and implicit parameters.

  • Spark Core & Architecture Deep knowledge of the internal workings of Apache Spark, including the Catalyst Optimizer, Tungsten execution engine, lazy evaluation, Directed Acyclic Graphs (DAGs), and memory management (execution vs. storage memory).

  • Performance Tuning Ability to identify and resolve performance bottlenecks like data skew, handling OOM (Out Of Memory) errors, optimizing joins (Broadcast vs. Sort-Merge), managing partition sizes, and avoiding expensive shuffle operations.

  • Structured APIs & Streaming Proficiency in Spark DataFrames/Datasets APIs and Spark Structured Streaming for low-latency, real-time data processing.

2. Next-Generation Storage & Table Formats
  • Apache Iceberg Expertise in implementing Iceberg as your open table format over a data lake. You must master features like ACID transactions, time travel, schema evolution (hidden partitioning), and row-level updates/deletes.

  • Apache Ozone Understanding Ozone as a scalable, redundant, and distributed object store designed specifically for Hadoop environments. You should know how it replaces or coexists with HDFS to handle billions of small and large files efficiently.

  • Storage Optimization Skills in managing data compaction (merging small files), snapshot isolation, and choosing optimal file formats like Parquet, ORC, or Avro.

3. The Hadoop Ecosystem Foundation
  • HDFS & YARN While industry focus is shifting toward object storage, you still need a strong understanding of HDFS architecture (NameNode, DataNode) and YARN resource management (Resource Manager, Node Manager) to debug legacy systems or manage hybrid environments.

  • Hive & Metastore Management Ability to manage catalog metadata and run distributed SQL queries over your distributed storage system.

4. Workflow Orchestration
  • Apache Airflow Mastery of building, scheduling, and monitoring complex data pipelines using Python-based DAGs.

  • Advanced Airflow Concepts Utilizing TaskFlow API, custom XComs, dynamic task mapping, and setting up efficient Task Groups.

  • Orchestration Integration Knowing how to safely trigger, monitor, and pass parameters to external Spark jobs or Cloud/Databricks operators) without overloading the Airflow worker nodes.

5. Architectural & Cross-Functional Skills
  • Data Lakehouse Architecture Designing unified platforms that combine the cost-effective storage of data lakes with the data management structures of data warehouses.

  • CI/CD & DataOps Writing clean, testable Scala/Python code using unit-testing frameworks (like ScalaTest) and automating deployments using Git, Docker, and CI/CD pipelines.

  • Advanced SQL Writing complex query logic, analytical window functions, and diagnosing execution plans—even when writing Spark code, SQL remains foundational.


At Zensar, we’re “experience-led everything” We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus

Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. Explore Life at Zensar and join us to Grow. Own. Achieve. Learn. to be the best version of yourself.

We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

Zensar Technologies

About Zensar Technologies

Zensar stands out as a premier technology consulting and services company, embracing an ‘experience-led everything’ philosophy. We are creators, thinkers, and problem solvers passionate about designing digital experiences that are engineered into scale-ready products, services, and solutions to deliver superior engagement to high-growth companies. This full lifecycle capability – from experience to engineering to engagement – is what makes us unique. This integrated approach also means that we harness the power of technology, creativity, and insight to deliver impact — ensuring our work focuses not just on technology but also on the people who use it.

Part of the $4.4 billion RPG Group, Zensar is headquartered in Pune, India. Our 10,000+ employees work across 30+ locations worldwide, including Seattle, Princeton, Cape Town, London, Singapore, and Mexico City. As an organization, we are diverse and multi-dimensional and unite across geographies and skill sets to deliver products and services that are value-driven, environmentally conscious, and human-centered.

To know more, visit us at www.zensar.com.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Pune, IN
Year Founded
2001
Social Media