Virtusa

Lead Data Engineer - Azure Databricks/Kafka

Virtusa  •  Dubai, AE (Onsite)  •  2 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Lead Data Engineer - Azure Databricks/Kafka-(CREQ262573)

Design and develop streaming ingestion pipelines using Apache Spark (Structured Streaming) and Databricks Auto Loader to consume files from cloud storage or messages from Kafka/RabbitMQ/Confluent Cloud and ingest them into Delta Lake, ensuring schema evolution and exactly once semantics. Implement CDC and deduplication logic by capturing change events from source databases using Debezium, built-in CDC features of SQL Server/Oracle, or other connectors, and apply watermarking and drop duplicate strategies based on primary keys and event timestamps. Scale ingestion through configuration by building a config-driven framework such as using Airflow, DBX Jobs, or Delta Live Tables that iterates over metadata tables to deploy/update ingestion pipelines for hundreds of tables/sources without code duplication. Implement monitoring, observability, and security by capturing streaming query metrics and publishing them to monitoring platforms like Prometheus and Grafana, setting up dashboards for lag, files processed, and processing duration, and enforcing role-based access control, encryption, and data masking. Participate in DevOps processes by using CI/CD pipelines, such as Jenkins or GitHub Actions, to automate the deployment of jobs, managing infrastructure with Terraform or similar tools, and following best practices for version control and code reviews. This role requires 5–8 years of experience designing and building data pipelines using Apache Spark, Databricks, or equivalent big data frameworks, along with hands-on expertise with streaming and messaging systems such as Apache Kafka, Confluent Cloud, RabbitMQ, or Azure Event Hub, including creating producers, consumers, and topics and integrating them into downstream processing. Candidates should possess a deep understanding of relational databases and CDC, with proficiency in SQL Server, Oracle, or other RDBMSs and experience capturing change events using Debezium or native CDC tools; proficiency in programming languages such as Python, Scala, or Java; solid knowledge of SQL for data manipulation and transformation; cloud platform expertise, specifically with Azure or AWS services for data storage, compute, and orchestration; and knowledge of data Lakehouse architectures, Delta Lake, partitioning strategies, and performance optimization. Additionally, familiarity with Git, CI/CD pipelines, and infrastructure-as-code is essential,

Primary Location

AE-DU-Dubai

Schedule

Full Time

Employee Status

Individual Contributor

Job Type

Experienced

Travel

No

Job Posting

02/07/2026, 7:49:57 AM
Virtusa

About Virtusa

Virtusa is a global product and platform engineering services company that makes experiences better with technology. We help organizations grow faster, more profitably, and more sustainably by reimagining enterprises through domain-driven solutions. We combine strategy, design, and engineering, backed by unmatched expertise at the intersection of industry, business, and technology to generate real-world business impact for clients.

Headquartered in Massachusetts with global delivery centers, Virtusa provides a broad range of services, solutions, and assets, including strategy and design, AI advisory and services, digital engineering, data and analytics, digital assurance, cloud and security, cx transformation and managed services across industries such as financial services, healthcare, communications, media, entertainment, travel, manufacturing, and technology.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Southborough, MA
Year Founded
1996
Social Media