Careers

Tezos ecosystem career opportunities

Tezos is the product of many organizations and individuals across the globe working together on an open-source project.

HPC Systems Engineer

Madfish

Madfish

Software Engineering
Ukraine · Europe
Posted on Nov 19, 2025

We are seeking an experienced HPC Systems Engineer to design and manage the high-performance computing infrastructure supporting our AI workloads — from deep learning training to large-scale data processing for educational analytics.

Responsibilities

  • Architect and maintain HPC clusters and GPU-based compute environments for ML model training.
  • Optimize distributed training pipelines (PyTorch DDP, Horovod, or Ray).
  • Manage job scheduling systems (Slurm, Kubernetes, or Azure Batch) for efficient workload allocation.
  • Ensure scalable data access and high-throughput pipelines between storage and compute nodes.
  • Collaborate with AI/ML and MLOps teams to integrate training pipelines with CI/CD workflows.
  • Implement security, resource monitoring, and cost optimization practices for compute clusters.
  • Automate cluster provisioning and teardown to support on-demand AI workloads.

Requirements

  • 5+ years managing HPC or GPU-based compute infrastructure.
  • Proficiency with Linux, Slurm, Kubernetes, Docker, and Terraform.
  • Hands-on experience with Azure Batch, AWS ParallelCluster, or on-prem HPC setups.
  • Strong understanding of networking, parallel file systems, and GPU performance tuning.
  • Familiarity with data pipelines and model training orchestration.
  • Excellent scripting skills (Bash, Python).