Job Description

Data Architect

About the Role

We are seeking a Data Architect with deep Big Data Engineering expertise to design and modernize large-scale, cloud-native data platforms This role emphasizes distributed data processing, real-time pipelines, data platform automation, and GenAI enablement on top of strong Big Data foundations.

Key Responsibilities

Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.

Required Technical Skills & Experience

12+ years in Big Data Engineering / Data Architecture roles.
Expert-level experience with Spark, PySpark, SQL, and distributed compute engines.
Strong knowledge of AWS Big Data stack S3, EMR, Glue, Athena, Redshift, Lambda, Step Functions.
Hands-on experience with Snowflake (performance tuning, data sharing, optimization).
Expertise in streaming platforms Kafka, Kinesis, Flink, or Spark Streaming.
Strong experience with data modeling (dimensional, Data Vault 2.0).
Proficiency in Python, schema evolution, partitioning, and data versioning.
Experience with orchestration and automation tools (Airflow, Dagster, CI/CD).
Working knowledge of GenAI data integration (feature stores, vector DBs, RAG pipelines).
Experience with Agile delivery and leading globally distributed engineering teams.

Responsibilities for Internal Candidates

Key Responsibilities

Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.

Key Responsibilities

Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.

Bachelor Degree

About EXL

Choosing a digital partner is about more than capabilities — it’s about collaboration and character.

Unrealistic overhauls and off-the-shelf products ignore what matters most — your unique needs, culture, goals, and your legacy data and technology environments.

At EXL, our collaboration is built on ongoing listening and learning to adapt our methodologies. We’re your business evolution partner—tailoring solutions that make the most of data to make better business decisions and drive more intelligence into your increasingly digital operations.

Whether your goals are scaling the use of AI and digital, redesign operating models, or driving better and faster decisions, we’re here to partner with you to help you gain—and maintain—competitive advantage with efficient, sustainable models at scale.

Our expertise in transformation, data science, and change management helps make your business more efficient and effective, improve customer relationships and enhance revenue growth. Instead of focusing on multi-year, resource- and time-intensive platform designs or migrations, we look deeper at your entire value chain to integrate strategies with impact.

We use our specialization in analytics, digital interventions, and operations management—alongside deep industry expertise — to deliver solutions that help you outperform the competition.

At EXL, it’s all about outcomes—your outcomes—and delivering success on your terms. Share your goals with us and together, we’ll optimize how you leverage data to drive your business forward.

For more information, visit www.exlservice.com.

Industry

Consulting & Advisory

Company Size

10,000+ employees

Headquarters

New York, NY

Year Founded

Unknown

Website

exlservice.com

Social Media

Data Architect AIG

Job Description

About EXL