Our apps ran fine — until they didn’t. Sporadic 504s and TCP timeouts began surfacing intermittently.
Root Causes Identified:
- Misconfigured readiness probes
- Aggressive connection reuse without keepalive
- DNS resolution delays under high pod churn
Tools like
tcpdump
, dig
, and kubectl describe
became my best friends. Eventually moved to Calico for more stable networking.Lesson: Networking issues in Kubernetes often look like app bugs at first glance.