Pyrvex — Backend & Infrastructure Engineering

The Gap Between "Rolling Update" and "Zero Downtime"

Kubernetes rolling updates keep pods available during deployment. But availability ≠ zero dropped requests. In practice, you will drop connections unless your Go service and your K8s config are both tuned correctly.

The Complete Checklist

1. Graceful Shutdown in Go

go
func main() {
    srv := &http.Server{
        Addr:    ":8080",
        Handler: router,
    }

    go func() {
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("server error: %v", err)
        }
    }()

    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    <-quit

    log.Println("shutting down...")
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    if err := srv.Shutdown(ctx); err != nil {
        log.Fatalf("forced shutdown: %v", err)
    }
    log.Println("server exited cleanly")
}

2. Health Check Endpoints

go
mux.HandleFunc("/healthz/live",  func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
})

mux.HandleFunc("/healthz/ready", func(w http.ResponseWriter, r *http.Request) {
    if err := db.PingContext(r.Context()); err != nil {
        http.Error(w, "db not ready", http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
})

3. Kubernetes Deployment Config

yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0       # Never remove a pod before a new one is ready
  template:
    spec:
      terminationGracePeriodSeconds: 60   # Must be > your shutdown timeout
      containers:
        - name: api
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "5"]   # Wait for kube-proxy to drain connections
          livenessProbe:
            httpGet:
              path: /healthz/live
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /healthz/ready
              port: 8080
            initialDelaySeconds: 3
            periodSeconds: 5
            failureThreshold: 3

4. PodDisruptionBudget

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

This prevents node drains from taking down too many pods simultaneously.

The preStop Sleep Trick

The sleep 5 in preStop is not optional. Here's why:

Kubernetes sends SIGTERM to your pod
Simultaneously, it removes the pod from the Service endpoints
But kube-proxy propagates endpoint changes asynchronously — it takes 2-5 seconds
During those seconds, traffic still routes to your terminating pod
Without preStop: sleep 5, your pod starts shutting down before traffic stops arriving

Key Takeaways

maxUnavailable: 0 is the most important setting
preStop: sleep 5 bridges the kube-proxy propagation gap
terminationGracePeriodSeconds must exceed your app shutdown timeout
PDB prevents accidental mass eviction during node maintenance