This is a remote position.
Key Responsibilities
* Assess the existing data pipeline, GCP architecture, data sources, and current limitations.
* Design a BigQuery-based medallion architecture: Bronze, Silver, Gold, and Serving layers.
* Design and/or implement data ingestion pipelines from Cloud Storage and Cloud SQL / MySQL into BigQuery.
* Define BigQuery datasets, tables, partitioning, clustering, and cost-control strategies.
* Support batch, incremental, and on-demand processing patterns.
* Design curated and BI-ready datasets for analytics and reporting.
* Support data quality checks, validation, logging, monitoring, and reprocessing patterns.
* Define integration patterns for Elasticsearch or other search / serving layers.
* Support architecture for AI enrichment, embeddings, semantic search, and media metadata where required.
* Prepare technical documentation, architecture diagrams, implementation roadmap, and handover materials.
* Work closely with client engineering, data, DevOps, and product teams.
Requirements
Required Skills
* Strong hands-on experience with Google Cloud data services.
* Strong BigQuery experience, including modeling, optimization, partitioning, clustering, and cost control.
* Experience with Cloud Storage ingestion patterns.
* Experience with Cloud SQL / MySQL to BigQuery integration.
* Experience building batch and incremental data pipelines.
* Experience with Dataform, dbt, Cloud Composer / Airflow, Dataflow, Cloud Run, or similar tools.
* Strong SQL and data modeling skills.
* Good understanding of lakehouse / medallion architecture.
* Experience with data quality, metadata, lineage, logging, and monitoring.
* Ability to work in an ambiguous consulting environment and translate high-level requirements into practical implementation plans.
* Strong communication skills and client-facing experience.
Preferred Skills
* Experience with Elasticsearch or search index publishing.
* Experience with Power BI-ready datasets or analytical serving layers.
* Experience with Vertex AI, Gemini, embeddings, vector search, or semantic search.
* Experience with media, social media, marketing, campaign, influencer, or audience data.
* Experience with Dataplex, Data Catalog, IAM, policy tags, row-level security, or column-level security.
* Experience with CI/CD for data pipelines.