Deep expertise in AWS data, streaming, and compute services, including Kinesis, Flink, Elasticsearch, PostgreSQL, Lambda / Fargate, Step Functions, S3, Apache Iceberg, Glue, Datadog, and Redshift Spectrum Strong experience in real-time and batch data processing architectures, including event-driven and microservices-based data platforms Advanced proficiency in SQL, Python, and Spark for large-scale data processing, transformations, and analytics Hands-on experience with CI/CD pipelines, Git, and Infrastructure-as-Code (Terraform / CloudFormation) for automated deployments Proven expertise in workflow orchestration tools (Airflow, Dagster, Step Functions) and pipeline scheduling Strong understanding of data modeling techniques (dimensional, normalized, and lakehouse architectures) Deep knowledge of partitioning strategies, indexing, and query performance optimization Experience with data lake and lakehouse architectures (S3 + Iceberg / Delta-like patterns) Familiarity with observability and monitoring tools (Datadog, CloudWatch) for pipeline health and performance tracking Knowledge of data governance, lineage, cataloging, and metadata management frameworks Understanding of security and access control mechanisms (IAM roles, RBAC, encryption, data masking) Experience with cost optimization strategies in AWS (storage tiering, compute optimization, efficient query design) Roles and Responsibilities 1. Data Platform Engineering Design, build, and scale robust ETL/ELT pipelines for structured and unstructured data sources Implement both batch and real-time data ingestion frameworks Ensure pipelines are fault-tolerant, reusable, and scalable 2. Data Architecture & Design Define and implement end-to-end data architecture across ingestion, processing, storage, and consumption layers Drive adoption of lakehouse architecture and modern data patterns Establish data contracts and schema evolution strategies 3. Performance & Optimization Optimize pipelines and queries for high performance, cost efficiency, and scalability Analyze and resolve bottlenecks in processing, storage, and query execution Implement efficient partitioning, indexing, and caching strategies 4. Data Governance, Security & Compliance Implementation of data governance frameworks, including:
Data quality checks
Data lineage and traceability
Metadata management
Ensure compliance with security standards and regulatory requirements Implement fine-grained access control, encryption, and auditing mechanisms 5. Observability & Reliability Engineering Build and maintain monitoring, logging, and alerting frameworks Define and track SLAs / SLIs / SLOs for data pipelines Ensure high availability, fault tolerance, and incident response readiness 6. Automation & DevOps Practices Develop and maintain CI/CD pipelines for data applications Enable infrastructure automation using Terraform / CloudFormation Promote DevOps best practices across data engineering workflows 7. Stakeholder Collaboration & Solutioning Collaborate with business stakeholders to translate requirements into scalable technical solutions Partner with analytics, risk, and reporting teams to ensure data usability and accessibility Provide technical guidance and feasibility analysis for new use cases