You'll be the engineer who owns the infrastructure platform end to end. That means you're not executing a ticket queue — you're deciding how things should be built, calling out what needs to change, and making it happen. We're at a critical point: the foundations are in place and we need someone who can assess where we are, close the gaps, and get the platform ready for production traffic at scale.

You'll work directly with the architect and development team, so you need to be comfortable making technical decisions, defending them, and explaining trade-offs clearly.

What You'll Work On

- Kubernetes infrastructure across multiple clusters and environments — day-to-day operations, reliability, capacity, and scaling
- CI/CD pipelines — build, test, and deploy flows for a polyglot microservices backend (Go, PHP) and Nuxt.js frontend
- Infrastructure as Code — Terraform-managed AWS resources; you'll improve what's there and introduce what's missing
- Observability — metrics, logs, tracing, alerting; making sure on-call means something actionable, not noise
- Production readiness — identifying and closing the gap between "works in dev" and "handles real load safely"
- Security and access control — IAM, RBAC, secrets management, least-privilege everywhere
- Database and cache operations — PostgreSQL, Redis, and Cassandra in a managed/cloud context
- Disaster recovery and backup — designing and testing runbooks, not just writing them

What We're Looking For

Must have:
- 3+ years operating Kubernetes in production (not just deploying to it — owning it)
- Strong Terraform skills — you write modules, manage state, review others' code
- AWS experience across compute, networking, storage, and IAM
- GitOps mindset — you've used ArgoCD, Flux, or similar and understand why it matters
- Monitoring and alerting experience with the Prometheus/Grafana stack
- Comfortable with Linux, shell scripting, and diagnosing things without a UI
- Able to read and reason about Go, Python, or similar — you don't need to write application code but you need to understand what it's doing

Strong plus:
- Experience with social or media platforms — you've seen what high fan-out writes, real-time messaging, and image/video delivery look like at scale
- Bare-metal or self-hosted experience — understanding what cloud abstractions are hiding is a real advantage
- Familiarity with Karpenter, KEDA, or similar autoscaling tooling
- Experience with Cassandra or other wide-column stores in production
- Security mindset — threat modeling, pen test familiarity, or cloud security certifications

What Good Looks Like in This Role

You join, spend a few weeks understanding the stack, then come back with a clear-eyed view: here's what's solid, here's what's a risk, here's the order we should fix it. You can have that conversation with the architect as a peer, not as someone waiting for direction.

Six months in, the team trusts the platform. Deploys are boring. Incidents have runbooks. Production is not a scary word.

Stack (without the full picture)

Kubernetes · Terraform · AWS · GitLab CI · ArgoCD · Helm · Prometheus · Grafana · Loki · PostgreSQL · Redis · Cassandra · Go microservices

What We Offer

- Meaningful ownership — this isn't a support role inside a 200-person infra team
- Direct access to decision-makers — no layers between you and the people setting direction
- Remote-friendly, async-first culture
- Competitive compensation based on experience

Apply now

See more open positions at Madfish

Powered by Getro.com