Data Engineer SouthRivers Data Responds Quickly $$$$
Madfish
Software Engineering, Data Science
Europe
Posted on May 10, 2026
About the Role
We are looking for a Data Engineer to design, build, and operate large-scale data pipelines and lakehouse platforms in an enterprise environment. You will work hands-on with AWS data services, Apache Spark, and Python to deliver reliable, performant, and well-modeled data products that power analytics and downstream applications.
Key Responsibilities
- Design, implement, and maintain robust, reliable, and scalable data pipelines for batch and large-scale processing workloads.
- Build and evolve a Data Lakehouse on AWS using cloud object storage (S3), open table formats, and distributed processing frameworks.
- Develop ETL/ELT workflows in Python and Apache Spark, ensuring performance, cost-efficiency, and maintainability.
- Model data for analytical and reporting use cases, applying dimensional and analytical modeling best practices.
- Write advanced SQL for transformation, optimization, and ad-hoc analysis across large datasets.
- Operate and optimize AWS data services such as EMR, Glue, Athena, and S3-based data lakes.
- Troubleshoot pipeline and platform issues end-to-end, applying system-level thinking to identify root causes and durable fixes.
- Collaborate with analysts, data scientists, and platform engineers to translate business requirements into technical solutions.
- Contribute to data quality, observability, governance, and CI/CD practices for data workloads.
Required Qualifications
- Strong, proven hands-on experience in data engineering within enterprise environments.
- Top-notch advanced SQL skills and solid understanding of analytical and dimensional data modeling.
- Strong hands-on experience with Data Lakehouse or modern data platform concepts: cloud object storage, open table formats (e.g., Delta, Iceberg, Hudi), and distributed processing.
- Strong hands-on experience with AWS data services: EMR, Glue, Athena, and S3-based data lakes.
- Strong hands-on experience with Apache Spark for large-scale data processing.
- Strong Python skills for ETL development, data processing, and automation.
- Demonstrated experience designing, implementing, and maintaining robust and reliable data pipelines in production.
- Very strong analytical, problem-solving, and system-level thinking skills.
Nice to Have
- Experience with workflow orchestration tools (e.g., Airflow, Step Functions).
- Familiarity with infrastructure-as-code (Terraform, CloudFormation) and CI/CD for data.
- Exposure to data governance, cataloging (e.g., AWS Glue Data Catalog, Lake Formation), and data quality frameworks.
- Streaming experience (Kafka, Kinesis, Spark Structured Streaming).