The Gap Between "Rolling Update" and "Zero Downtime"
Kubernetes rolling updates keep pods available during deployment. But availability ≠ zero dropped requests. In practice, you will drop connections unless your Go service and your K8s config are both tuned correctly.
The Complete Checklist
1. Graceful Shutdown in Go
gofunc main() { srv := &http.Server{ Addr: ":8080", Handler: router, } go func() { if err := srv.ListenAndServe(); err != http.ErrServerClosed { log.Fatalf("server error: %v", err) } }() quit := make(chan os.Signal, 1) signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT) <-quit log.Println("shutting down...") ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() if err := srv.Shutdown(ctx); err != nil { log.Fatalf("forced shutdown: %v", err) } log.Println("server exited cleanly") }
2. Health Check Endpoints
gomux.HandleFunc("/healthz/live", func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusOK) }) mux.HandleFunc("/healthz/ready", func(w http.ResponseWriter, r *http.Request) { if err := db.PingContext(r.Context()); err != nil { http.Error(w, "db not ready", http.StatusServiceUnavailable) return } w.WriteHeader(http.StatusOK) })
3. Kubernetes Deployment Config
yamlspec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 # Never remove a pod before a new one is ready template: spec: terminationGracePeriodSeconds: 60 # Must be > your shutdown timeout containers: - name: api lifecycle: preStop: exec: command: ["sleep", "5"] # Wait for kube-proxy to drain connections livenessProbe: httpGet: path: /healthz/live port: 8080 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /healthz/ready port: 8080 initialDelaySeconds: 3 periodSeconds: 5 failureThreshold: 3
4. PodDisruptionBudget
yamlapiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: api-pdb spec: minAvailable: 2 selector: matchLabels: app: api
This prevents node drains from taking down too many pods simultaneously.
The preStop Sleep Trick
The sleep 5 in preStop is not optional. Here's why:
- Kubernetes sends
SIGTERMto your pod - Simultaneously, it removes the pod from the Service endpoints
- But kube-proxy propagates endpoint changes asynchronously — it takes 2-5 seconds
- During those seconds, traffic still routes to your terminating pod
- Without
preStop: sleep 5, your pod starts shutting down before traffic stops arriving
Key Takeaways
maxUnavailable: 0is the most important settingpreStop: sleep 5bridges the kube-proxy propagation gapterminationGracePeriodSecondsmust exceed your app shutdown timeout- PDB prevents accidental mass eviction during node maintenance