Job Description

Data Engineer (Production Support) for AWS EMR with Spark, Scala and Talend or any ETL tool Experience

We are seeking a highly skilled and motivated Data Engineer specializing in Production Support for AWS EMR (Elastic MapReduce ) with spark, scala, Talend or any ETL tool knowledge to join our dynamic team. The ideal candidate will ensure the smooth operation, performance, and stability of large-scale distributed data processing pipelines and applications deployed on AWS EMR. This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.

Key Responsibilities

1. Production Support

Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines.
Investigate and debug data processing failures, latency issues, and performance bottlenecks.
Provide support for mission-critical production systems as part of an on-call rotation.
Analytical and problem-solving skills applied to Big Data domain. Strong exposure in Object Oriented concepts and implementation.

2. Cluster Management

Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization.
Ensure effective resource utilization and cost optimization of clusters.
Apply patches and upgrades to EMR clusters and software components as needed.

3. Data Pipeline Maintenance

Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR.
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, Mysql or Snowflake.
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch.

4. Performance Optimization

Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency.
Identify and address inefficiencies in data storage and access patterns.
Providing optimal solutions for performance enhancement and fine tuning of current applications.

5. Monitoring and Reporting

Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
Develop alerting mechanisms and dashboards for proactive issue identification.
Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues

6. Collaboration and Documentation

Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows.
Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices.

Required Skills and Qualifications

Technical Expertise:

Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch.
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto.
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo.
Familiarity with data loading tools like Talend, Sqoop. Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow.
Strong knowledge of Shell Scripting, python or Java for scripting and automation.
Familiarity with SQL and query optimization techniques.

Operational Skills:

Experience in production support for large-scale distributed systems or data platforms.
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Implement data modelling concepts, methodologies to optimize data warehouse solutions.
Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases

Problem-Solving:

Strong analytical skills to debug complex systems and resolve performance bottlenecks.
Soft Skills:
Effective communication skills to coordinate with cross-functional teams.
A proactive and customer-focused attitude to provide excellent production support.

Preferred Skills

Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments.
Familiarity with container orchestration tools (e.g., Kubernetes, Docker).
Knowledge of data governance, security, and compliance in cloud environments.
Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).

Education and Experience

Bachelor’s degree in computer science, Engineering, or a related field.
10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role.

Data Engineer (Production Support) for AWS EMR with Spark, Scala and Talend or any ETL tool Experience

Key Responsibilities

1. Production Support

Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines.
Investigate and debug data processing failures, latency issues, and performance bottlenecks.
Provide support for mission-critical production systems as part of an on-call rotation.
Analytical and problem-solving skills applied to Big Data domain. Strong exposure in Object Oriented concepts and implementation.

2. Cluster Management

Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization.
Ensure effective resource utilization and cost optimization of clusters.
Apply patches and upgrades to EMR clusters and software components as needed.

3. Data Pipeline Maintenance

Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR.
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, Mysql or Snowflake.
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch.

4. Performance Optimization

Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency.
Identify and address inefficiencies in data storage and access patterns.
Providing optimal solutions for performance enhancement and fine tuning of current applications.

5. Monitoring and Reporting

Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
Develop alerting mechanisms and dashboards for proactive issue identification.
Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues

6. Collaboration and Documentation

Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows.
Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices.

Required Skills and Qualifications

Technical Expertise:

Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch.
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto.
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo.
Familiarity with data loading tools like Talend, Sqoop. Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow.
Strong knowledge of Shell Scripting, python or Java for scripting and automation.
Familiarity with SQL and query optimization techniques.

Operational Skills:

Experience in production support for large-scale distributed systems or data platforms.
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Implement data modelling concepts, methodologies to optimize data warehouse solutions.
Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases

Problem-Solving:

Strong analytical skills to debug complex systems and resolve performance bottlenecks.
Soft Skills:
Effective communication skills to coordinate with cross-functional teams.
A proactive and customer-focused attitude to provide excellent production support.

Preferred Skills

Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments.
Familiarity with container orchestration tools (e.g., Kubernetes, Docker).
Knowledge of data governance, security, and compliance in cloud environments.
Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).

Education and Experience

Bachelor’s degree in computer science, Engineering, or a related field.
10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role.

Data Engineer (Production Support) for AWS EMR with Spark, Scala and Talend or any ETL tool Experience

Key Responsibilities

1. Production Support

Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines.
Investigate and debug data processing failures, latency issues, and performance bottlenecks.
Provide support for mission-critical production systems as part of an on-call rotation.
Analytical and problem-solving skills applied to Big Data domain. Strong exposure in Object Oriented concepts and implementation.

2. Cluster Management

Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization.
Ensure effective resource utilization and cost optimization of clusters.
Apply patches and upgrades to EMR clusters and software components as needed.

3. Data Pipeline Maintenance

Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR.
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, Mysql or Snowflake.
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch.

4. Performance Optimization

Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency.
Identify and address inefficiencies in data storage and access patterns.
Providing optimal solutions for performance enhancement and fine tuning of current applications.

5. Monitoring and Reporting

Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
Develop alerting mechanisms and dashboards for proactive issue identification.
Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues

6. Collaboration and Documentation

Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows.
Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices.

Required Skills and Qualifications

Technical Expertise:

Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch.
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto.
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo.
Familiarity with data loading tools like Talend, Sqoop. Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow.
Strong knowledge of Shell Scripting, python or Java for scripting and automation.
Familiarity with SQL and query optimization techniques.

Operational Skills:

Experience in production support for large-scale distributed systems or data platforms.
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Implement data modelling concepts, methodologies to optimize data warehouse solutions.
Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases

Problem-Solving:

Strong analytical skills to debug complex systems and resolve performance bottlenecks.
Soft Skills:
Effective communication skills to coordinate with cross-functional teams.
A proactive and customer-focused attitude to provide excellent production support.

Preferred Skills

Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments.
Familiarity with container orchestration tools (e.g., Kubernetes, Docker).
Knowledge of data governance, security, and compliance in cloud environments.
Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).

Education and Experience

Bachelor’s degree in computer science, Engineering, or a related field.
10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role.

About NTT DATA

NTT DATA – a part of NTT Group – IT and business services headquartered in Tokyo. We help clients transform through consulting, industry solutions, business process services, digital & IT modernization and managed services. NTT DATA enables them, as well as society, to move confidently into the digital future. We are committed to our clients’ long-term success and combine global reach with local client attention to serve them in over 50 countries around the globe.

Industry

IT & Software

Company Size

10,000+ employees

Headquarters

Tokyo, JP

Year Founded

Unknown

Website

nttdata.com

Social Media

Data Engineer (Production Support) for AWS EMR-远程居家办公

Job Description

About NTT DATA