Job Description

This is a remote position.

We are seeking a Senior Data Engineer to support the ingestion, processing, and synchronization of data across our analytics platform. This role focuses on using Python Notebooks to ingest data via APIs into Microsoft Fabric's Data Lake and Data Warehouse, with some data being synced to a Synapse Analytics database for broader reporting needs.
The ideal candidate will have hands-on experience working with API-based data ingestion and modern data architectures, including implementing Medallion layer architecture (Bronze, Silver, Gold) for optimal data organization and quality management, with bonus points for exposure to marketing APIs like Google Ads, Google Business Profile, and Google Analytics 4.

This is a remote position. We welcome applicants globally, but this role has a preference for LATAM candidates to ensure smoother collaboration with our existing team
Key Responsibilities

Build and maintain Python Notebooks to ingest data from third-party APIs
Design and implement Medallion layer architecture (Bronze, Silver, Gold) for structured data organization and progressive data refinement
Store and manage data within Microsoft Fabric's Data Lake and Warehouse using delta parquet file formats
Set up data pipelines and sync key datasets to Azure Synapse Analytics
Develop PySpark-based data transformation processes across Bronze, Silver, and Gold layers
Collaborate with developers, analysts, and stakeholders to ensure data availability and accuracy
Monitor, test, and optimize data flows for reliability and performance
Document processes and contribute to best practices for data ingestion and transformation

Tech Stack You'll Use
Ingestion & Processing:

Python (Notebooks)
PySpark

Storage & Warehousing:

Microsoft Fabric Data Lake & Data Warehouse
Delta Parquet files

Sync & Reporting:

Azure Synapse Analytics

Cloud & Tooling:

Azure Data Factory, Azure DevOps

Requirements

Strong experience with Python for data ingestion and transformation
Proficiency with PySpark for large-scale data processing;
Proficiency in working with RESTful APIs and handling large datasets;
Experience with Microsoft Fabric or similar modern data platforms;
Understanding of Medallion architecture (Bronze, Silver, Gold layers) and data lakehouse concepts;
Experience working with Delta Lake and parquet file formats;
Understanding of data warehousing concepts and performance tuning;
Familiarity with cloud-based workflows, especially within the Azure ecosystem.

Nice to Have

Experience with marketing APIs such as Google Ads or Google Analytics 4;
Familiarity with Azure Synapse and Data Factory pipeline design;
Understanding of data modeling for analytics and reporting use cases;
Experience with AI coding tools;
Experience with Fivetran, Aribyte, and Riverly.

About Curotec

Curotec was founded in 2010 just outside of Philadelphia in an area known as the Philadelphia Main Line. We have a global presence with clients ranging from funded startups to Fortune 100 enterprises spanning a number of vertical industries.

The work we’ve done has won numerous awards and we’ve been recognized by global organizations for providing exceptional business value to our clients on a consistent basis. When it comes to digital business solutions, there is no challenge too complicated that we cannot tackle.

Industry

IT & Software

Company Size

51-200 employees

Headquarters

Newtown Square, Pennsylvania

Year Founded

2010

Website

curotec.com

Social Media