Data Engineer / DataOps
Madfish
DeepX is looking for an experienced Data Engineer to drive our data integration initiatives. In this role, you will connect, transform, and prepare complex datasets to support centralized reporting and actionable business insights. Leveraging modern cloud-based technologies, data orchestration frameworks, and API integrations, you will play a pivotal role in ensuring our data infrastructure meets the evolving needs of our organization.
Key Responsibilities
- Architect, build, and maintain scalable and reliable ETL/ELT pipelines to integrate data from diverse international sources.
- Engineer data transformations that convert raw, complex data into clean, analysis-ready formats suitable for downstream analytics.
- Leverage the Google Cloud Platform (GCP) suite to build and manage scalable data storage and processing solutions, ensuring optimal security, reliability, and performance.
- Orchestrate complex data workflows using Apache Airflow, developing and maintaining robust DAGs for scheduling and monitoring.
- Troubleshoot and resolve issues within data pipelines and optimize workflow scheduling to guarantee timely data availability.
- Independently integrate with third-party services by interpreting API documentation, managing authentication, and developing custom data extraction solutions.
- Master Google Analytics 4's BigQuery export, structuring raw event data by flattening nested fields (e.g., event_params, user_properties) into query-optimized tables.
- Partner with our Business Intelligence teams to align data models and pipelines, seamlessly feeding into visualization tools like Looker Studio, DOMO, and Looker.
- Provide dedicated data support for dashboards, analytical projects, and ad-hoc reporting.
- Integrate and manage modern data connector tools, such as Stitch Data, and stay current with emerging technologies to enhance our data capabilities.
- Collaborate effectively with data analysts, data scientists, and other cross-functional teams to translate business needs into technical specifications.
- Curate and maintain comprehensive documentation for all data workflows, architectural designs, and transformation logic.
- Implement rigorous data validation, monitoring, and testing strategies to ensure data integrity and continuously improve pipeline performance and cost-efficiency.
Qualifications
- A minimum of 3 years of professional experience in a data engineering role, preferably with exposure to international datasets.
- Deep, hands-on experience with the Google Cloud Platform (GCP) ecosystem.
- Demonstrable expertise in orchestrating data pipelines with Apache Airflow, including DAG development and maintenance.
- Solid background in building production-grade ETL/ELT pipelines and utilizing connector tools like Stitch Data.
- Proven ability to work with APIs, from reading documentation to implementing data extraction logic.
- Experience handling Google Analytics 4 BigQuery exports, specifically with flattening nested data structures.
- Proficiency in SQL and at least one programming language (e.g., Python, Java, or Scala) for data manipulation and automation.
- Familiarity with BI platforms (Looker Studio, DOMO, Looker) and supporting BI team requirements.
- Proficiency with version control systems, particularly Git.
- Strong problem-solving skills with the ability to translate business requirements into technical solutions and optimize complex data processes.
- Excellent communication and collaboration skills, with the ability to work effectively in an international team environment.
- A proactive and detail-oriented mindset with a commitment to data quality and performance.
- English proficiency: Upper-Intermediate or higher.
About DeepX
DeepX is an R&D intensive and innovation-driven consortium that provides Artificial Intelligence-powered Computer Vision solutions for businesses. To find out more about us, please visit: https://deepxhub.com/