Back to Blog
KubernetesDevOpsGolangGCP

Zero-Downtime Deployments for Go Services on Kubernetes

March 10, 20267 min read

The Gap Between "Rolling Update" and "Zero Downtime"

Kubernetes rolling updates keep pods available during deployment. But availability ≠ zero dropped requests. In practice, you will drop connections unless your Go service and your K8s config are both tuned correctly.


The Complete Checklist

1. Graceful Shutdown in Go

go
func main() { srv := &http.Server{ Addr: ":8080", Handler: router, } go func() { if err := srv.ListenAndServe(); err != http.ErrServerClosed { log.Fatalf("server error: %v", err) } }() quit := make(chan os.Signal, 1) signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT) <-quit log.Println("shutting down...") ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() if err := srv.Shutdown(ctx); err != nil { log.Fatalf("forced shutdown: %v", err) } log.Println("server exited cleanly") }

2. Health Check Endpoints

go
mux.HandleFunc("/healthz/live", func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusOK) }) mux.HandleFunc("/healthz/ready", func(w http.ResponseWriter, r *http.Request) { if err := db.PingContext(r.Context()); err != nil { http.Error(w, "db not ready", http.StatusServiceUnavailable) return } w.WriteHeader(http.StatusOK) })

3. Kubernetes Deployment Config

yaml
spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 # Never remove a pod before a new one is ready template: spec: terminationGracePeriodSeconds: 60 # Must be > your shutdown timeout containers: - name: api lifecycle: preStop: exec: command: ["sleep", "5"] # Wait for kube-proxy to drain connections livenessProbe: httpGet: path: /healthz/live port: 8080 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /healthz/ready port: 8080 initialDelaySeconds: 3 periodSeconds: 5 failureThreshold: 3

4. PodDisruptionBudget

yaml
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: api-pdb spec: minAvailable: 2 selector: matchLabels: app: api

This prevents node drains from taking down too many pods simultaneously.


The preStop Sleep Trick

The sleep 5 in preStop is not optional. Here's why:

  1. Kubernetes sends SIGTERM to your pod
  2. Simultaneously, it removes the pod from the Service endpoints
  3. But kube-proxy propagates endpoint changes asynchronously — it takes 2-5 seconds
  4. During those seconds, traffic still routes to your terminating pod
  5. Without preStop: sleep 5, your pod starts shutting down before traffic stops arriving

Key Takeaways

  • maxUnavailable: 0 is the most important setting
  • preStop: sleep 5 bridges the kube-proxy propagation gap
  • terminationGracePeriodSeconds must exceed your app shutdown timeout
  • PDB prevents accidental mass eviction during node maintenance