Lead Data Engineer
Madfish
We are looking for a Lead Data Engineer to join our client’s team on an outstaff basis, helping financial institutions combat money laundering and fraud by building resilient, governed, high-quality data platforms.
You will own and evolve the client's Databricks + AWS Lakehouse ecosystem, enabling analysts, investigators, and product teams to derive insights into criminal behavior and act decisively.
This is a hands-on leadership role within the client’s engineering team, where you will design advanced data engineering solutions, enforce lakehouse best practices, mentor engineers, and drive quality, observability, and reliability across all data assets.
Key Responsibilities
- Own end-to-end design, build, optimization, and support of Spark/PySpark pipelines on Databricks (Batch + Streaming).
- Define and enforce Medallion (bronze/silver/gold) architecture standards, schema governance, lineage, and SLAs.
- Build secure and reusable data ingestion flows using NiFi, SFTP/FTPS, APIs, etc.
- Architect secure AWS data infrastructure: S3, IAM, KMS, Glue, Lake Formation, EC2/EKS, Lambda, Step Functions, Secrets Manager.
- Implement orchestration with Airflow, Databricks Workflows, Step Functions; standardize DAG patterns (idempotency, retries, observability).
- Champion data quality: expectations, anomaly detection, reconciliation, contract tests.
- Embed lineage & metadata via Unity Catalog, Glue, OpenLineage for audit and regulatory transparency.
- Drive CI/CD for data assets (IaC, notebook/test automation, artifact versioning, semantic tagging).
- Mentor engineers on distributed data performance, Delta Lake optimization, cost/performance trade-offs.
- Collaborate with Data Science, Product, and Compliance teams to translate analytical needs into robust data models.
- Lead technical design reviews and trade-off decisions.
- Support incident response, root-cause investigation, and preventative engineering.
- Drive continuous improvement: backlog triage, sizing, delivery tracking, stakeholder demos.
Requirements
- Expert-level SQL and hands-on experience with Databricks, Snowflake, Python, PySpark.
- Strong background in scalable data models, pipelines, and modern Cloud Lakehouse architecture.
- Proven production experience with Spark/PySpark (clusters, Delta Lake, Photon).
- Experience with Airflow, Databricks Workflows, Step Functions.
- Strong AWS knowledge: S3 layout strategies, IAM, Glue Catalog, Lake Formation, networking, encryption.
- CI/CD experience: Git branching, PR workflows, automated deployments, Terraform/CloudFormation.
- Familiarity with governance & lineage tools (Unity Catalog, OpenLineage, Atlas) and compliance (PII/PCI, retention).
- Strong communication skills and experience working with cross-functional partners (data science, product, compliance, security).
-
Incident management experience (on-call, observability dashboards, MTTR reduction).
Nice to Have
- Experience with Delta Lake, Databricks Lakehouse, and Unity Catalog.
- NiFi ingestion pipelines, Hadoop ecosystem familiarity.
- Python packaging for shared libraries.
- OpenTelemetry or similar observability tooling.
- Financial crime / AML domain exposure.