Kubernetes Health Checks That Stop Restart Storms
By Priyatham Rama Sai
Misconfigured probes took down a payments adjacency during traffic spikes — endless kill loops because readiness failed while startup still warmed caches. Here is how we separate liveness from readiness without lying to kubelet.
Liveness versus readiness
Liveness should answer ‘should we restart?’ Readiness should answer ‘should we route traffic?’ Mixing DB checks into liveness turns transient dependency blips into pod churn.
Startup semantics
Startup probes help slow JVM or migration-heavy boots without widening liveness blindly. Tune periods from observed P95 boot curves, not defaults copied from tutorials.
Observability tie-in
Pair probes with RED metrics so operators see failing checks alongside saturation — logs alone rarely explain probe-thrashing incidents.