← Back to Blog

From Code to Production: Building a Full CI/CD + Kubernetes Pipeline from Scratch

Most student projects stop where the README promises they will go: “Dockerized, cloud-ready, CI/CD coming soon.” Lay-Off-Link was different. I wanted an end-to-end MLOps and data platform that behaved like something a team would actually operate: multiple services, real persistence, automated gates on every merge, and a path from a laptop to a cluster that I could explain in an interview without hand-waving.

This post is a project breakdown, not a step-by-step tutorial. I care about why each layer existed, what broke my mental model along the way, and what I would defend in a design review. If you are hiring for platform or backend-adjacent roles, think of it as evidence that I have shipped the boring, load-bearing parts of production systems, not just the demo.

Motivation: Close the Gap Between “It Works on My Machine” and “It Runs There”

Lay-Off-Link combines ML workflows and data plumbing. That combination is unforgiving: you get schema drift, long-running jobs, sensitive artifacts, and services that fail in different ways under load. A single container and a shell script were never going to be enough.

My goal was to enforce a simple contract on the repository:

  • Every change is scrutinized before it touches an environment. Linting, tests, coverage, and security scanning are not “nice-to-haves”; they are merge blockers.
  • Runtime behavior is declared, not improvised. Replicas, storage, and scaling limits are part of the system design, not afterthoughts.
  • Local and remote paths stay aligned. Docker Compose for developer velocity; Kubernetes + Helm for how the system actually runs at scale.

System Architecture (Three Services, One Platform)

At a high level, Lay-Off-Link is three cooperating services on an MLOps/data platform: each owns a slice of the problem (serving, orchestration, data access; exact boundaries depend on how you split the domain), and they communicate over well-defined interfaces rather than shared mutable state.

That split matters for operations. When something degrades, you want blast radius contained, logs that point to a single deployable unit, and the option to scale or roll back independently. Three services is small enough to reason about in a portfolio conversation, but large enough to force real integration work: networking, configuration, secrets, and release ordering all show up.

[Placeholder: Architecture diagram. Three services, data stores, and traffic paths from client/API through to batch/ML components.]

Recruiter-friendly translation: I did not just “use Kubernetes.” I designed a multi-service topology, then implemented the automation and packaging so that topology could be reproduced and evolved safely.

🐳 Docker Compose: Honest Local Development

Compose is often dismissed as a toy. For this project, it was the fastest way to answer a serious question: Can a new contributor stand the full stack up without reading fifteen wiki pages?

Practical reasoning:

  • Parity with container images used in CI and K8s reduced “works in compose, fails in prod” surprises.
  • Service dependencies (startup order, health expectations) became explicit early, which later informed Helm hooks and readiness probes.
  • Iteration speed beat a remote cluster for debugging integration bugs; the cluster is where you validate orchestration, not where you discover a typo in an env var for the first time.

Kubernetes + Helm: Production Shape

Moving to Kubernetes was less about buzzwords and more about encoding operational intent: how many copies run, what happens under traffic spikes, and where state actually lives.

  • Three replicas as a baseline: enough to survive a single-node blip and to exercise real concurrency assumptions (connection pools, idempotency, cache coherence) without pretending to be hyperscale.
  • Horizontal Pod Autoscaler from 2 to 10 pods: the interesting part is not the numbers themselves, but what they force you to think about: startup time, graceful shutdown, downstream rate limits, and whether your metrics reflect user pain or CPU noise.
  • 5Gi persistent storage: acknowledges that ML/data workloads do not fit purely ephemeral disks. PVCs push you into backup mental models, storage classes, and the difference between “crash-only” and “durable” semantics.

Helm gave me parameterized releases: the same chart skeleton could stretch across environments with different limits, images, and feature flags, without copy-pasting YAML until it rots.

[Placeholder: Screenshot of kubectl get pods / HPA status or Helm release overview showing replicas and resources.]

Terraform: Infrastructure as the Source of Truth

Kubernetes describes how workloads run; Terraform described what cloud primitives existed around them. I reached for Terraform when click-ops would have been faster short-term but expensive long-term: drift, tribal knowledge, and “who flipped this switch?” incidents.

The mature takeaway: Terraform is not magic; it is a contract. If your modules are vague, you get vague infrastructure. I optimized for reviewable plans and small, incremental applies rather than one giant blast of state on a tired Friday night.

[Placeholder: Diagram or screenshot of Terraform dependency graph or plan summary for core resources.]

GitHub Actions: Seven Jobs, One Pipeline

CI/CD here is deliberately multi-stage. The point is to fail fast on cheap checks and only spend GPU minutes, registry pushes, or cluster-facing steps when the codebase has already proven it is not obviously broken.

Job (conceptual) Role in the pipeline Why it mattered
Lint & static checks Normalize style and catch foot-guns before tests run Keeps review focused on design, not formatting debates
Unit / service tests Validate logic in isolation per component Fast signal on regressions without standing up the world
Integration coverage Exercise cross-service contracts Where “works alone” meets “works together”
Coverage gate (80%) Enforce a floor on testable surface area Prevents silent growth of untested paths
Image builds (5 images) Produce immutable artifacts for each service and dependencies Same bits from CI to cluster; reproducible rollbacks
Security scan (Trivy) CVE and misconfiguration signal on images Shifts a class of bugs left of production
Performance (k6) Load-smoke critical paths Catches regressions unit tests will not see

[Placeholder: Screenshot of GitHub Actions workflow graph showing seven jobs and their pass/fail status on a representative PR.]

Across the suite, the repository carries 65+ tests with 80% coverage enforced. Numbers are not virtue by themselves; the discipline is. Coverage as a gate only works if you refuse to game it with meaningless asserts. I used it as a forcing function to keep new modules honest, especially around error handling and data edge cases where ML systems love to fail quietly.

Security and Performance as First-Class Citizens

Trivy on five images sounds repetitive until you remember that each image has a different dependency footprint. A vulnerability in a training utility container might never appear in an API-only image. Scanning each build artifact respects that reality.

k6 was my antidote to false confidence from green unit tests. Throughput and latency tell you whether timeouts, connection pools, and autoscaling policies actually match user behavior, or whether you have built a system that looks healthy while queuing requests to death.

[Placeholder: Screenshot of k6 summary output (latency percentiles, iterations) or Trivy report excerpt.]

What I Learned About Real Production Systems

  • Production is mostly clarity under change. The hard problems were naming, boundaries, defaults, and what happens when two “obviously correct” configs disagree.
  • Automation exposes assumptions. Every brittle script I wrote eventually became a flaky job. Stable pipelines favor boring patterns and explicit inputs.
  • Scaling knobs are policy decisions. Choosing 2 to 10 pods and three replicas is as much about cost, observability, and team cognitive load as it is about raw QPS.
  • Tests and scans are communication tools. They tell the next person, including future me, what I believed was non-negotiable about correctness and safety.

Closing

Lay-Off-Link started as an MLOps and data platform idea and became a lesson in how much engineering lives outside the model code. The stack (Docker Compose, Kubernetes, Helm, Terraform, and a seven-job GitHub Actions pipeline with linting, 65+ tests, enforced coverage, multi-image builds, Trivy, and k6) is not a checklist I chased for buzzwords. It is the minimum credible story for this system: multiple services, durable state, and the expectation that every merge either proves itself or gets stopped.

If you are building something similar, borrow the reasoning before the tools. The tools are interchangeable; the habits (declarative infra, gated CI, and honest local-to-prod parity) are what persist.