We are looking for a Data Engineer to design, build, and operate large-scale data pipelines and lakehouse platforms in an enterprise environment. You will work hands-on with AWS data services, Apache Spark, and Python to deliver reliable, performant, and well-modeled data products that power analytics and downstream applications.

Key Responsibilities

Design, implement, and maintain robust, reliable, and scalable data pipelines for batch and large-scale processing workloads.
Build and evolve a Data Lakehouse on AWS using cloud object storage (S3), open table formats, and distributed processing frameworks.
Develop ETL/ELT workflows in Python and Apache Spark, ensuring performance, cost-efficiency, and maintainability.
Model data for analytical and reporting use cases, applying dimensional and analytical modeling best practices.
Write advanced SQL for transformation, optimization, and ad-hoc analysis across large datasets.
Operate and optimize AWS data services such as EMR, Glue, Athena, and S3-based data lakes.
Troubleshoot pipeline and platform issues end-to-end, applying system-level thinking to identify root causes and durable fixes.
Collaborate with analysts, data scientists, and platform engineers to translate business requirements into technical solutions.
Contribute to data quality, observability, governance, and CI/CD practices for data workloads.

Required Qualifications

Strong, proven hands-on experience in data engineering within enterprise environments.
Top-notch advanced SQL skills and solid understanding of analytical and dimensional data modeling.
Strong hands-on experience with Data Lakehouse or modern data platform concepts: cloud object storage, open table formats (e.g., Delta, Iceberg, Hudi), and distributed processing.
Strong hands-on experience with AWS data services: EMR, Glue, Athena, and S3-based data lakes.
Strong hands-on experience with Apache Spark for large-scale data processing.
Strong Python skills for ETL development, data processing, and automation.
Demonstrated experience designing, implementing, and maintaining robust and reliable data pipelines in production.
Very strong analytical, problem-solving, and system-level thinking skills.

Nice to Have

Experience with workflow orchestration tools (e.g., Airflow, Step Functions).
Familiarity with infrastructure-as-code (Terraform, CloudFormation) and CI/CD for data.
Exposure to data governance, cataloging (e.g., AWS Glue Data Catalog, Lake Formation), and data quality frameworks.
Streaming experience (Kafka, Kinesis, Spark Structured Streaming).

Apply now

See more open positions at Madfish

Powered by Getro.com