5 Kubernetes Mistakes That Are Doubling Your Cloud Bill

Kubernetes is great at orchestrating containers. It is also great at spending your money if you are not careful. We run cloud cost audits for clients regularly, and Kubernetes clusters are consistently the biggest source of waste. Here are the five mistakes we see on almost every engagement, each with a concrete fix.

Mistake 1: Over-Provisioned Resource Requests

This is the number one cost driver. Teams set CPU and memory requests based on worst-case estimates, then never revisit them. A pod requesting 2 CPU cores and 4GB of memory but actually using 0.3 cores and 800MB is blocking resources that other pods could use. The cluster autoscaler provisions more nodes to fit the requested resources, and you pay for all of it.

The fix: Install a tool like Goldilocks or the Kubernetes VPA (Vertical Pod Autoscaler) in recommendation mode. Let it observe actual usage for two weeks, then adjust requests to match the 95th percentile of observed usage plus a 20% buffer. On one client project, this single change reduced their node count from 47 to 29.

Mistake 2: No Horizontal Pod Autoscaler

Running a fixed number of replicas for workloads with variable traffic means you are either over-provisioned during quiet periods or under-provisioned during peaks. Most teams pick a replica count that handles peak traffic and leave it there permanently.

The fix: Configure HPA for every user-facing workload. Set the target CPU utilization to 65-70% and let Kubernetes scale pods up and down automatically. For workloads that need to scale on custom metrics (like queue depth), use KEDA. One client reduced their average pod count from 120 to 45 during off-peak hours, which translated to 3 fewer nodes running overnight.

Mistake 3: Always-On Dev and Staging Clusters

Production clusters need to run 24/7. Dev and staging clusters do not. Yet on almost every audit, we find dev clusters running through nights, weekends, and holidays. For a cluster with 10 nodes at $200/month per node, running it only during business hours (roughly 25% of the time) saves $18,000 per year.

The fix: Schedule non-production clusters to scale down to zero nodes outside business hours. You can do this with a CronJob that adjusts the cluster autoscaler's min/max settings, or use managed solutions like GKE's cluster scheduling. Keep a small node pool (1-2 nodes) if you need CI/CD to run overnight builds.

Mistake 4: Not Using Spot Nodes

Spot instances (or preemptible VMs on GCP) cost 60-80% less than on-demand. Many teams avoid them because they worry about interruption, but Kubernetes handles this well. When a spot node gets reclaimed, the pods are rescheduled to other nodes. For stateless workloads (most web services, batch jobs, CI runners), this works flawlessly.

The fix: Create a spot node pool alongside your on-demand pool. Use node affinity and tolerations to schedule non-critical workloads on spot nodes. Keep your core services (databases, stateful sets) on on-demand nodes. A 70/30 split between spot and on-demand is a good starting point. One client runs 80% of their staging workloads on spot nodes and saves $4,200 per month.

Mistake 5: Orphaned Persistent Volumes

When a StatefulSet gets deleted, its PersistentVolumeClaims often survive because the reclaim policy defaults to "Retain." This is correct for production data, but it means old volumes accumulate quietly. We have found clients with dozens of 500GB EBS volumes sitting unused, costing $50/month each.

The fix: Run a monthly audit of PVCs. Any volume not attached to a running pod for 30+ days should be reviewed and likely deleted. Automate this with a simple script that lists unbound PVCs and sends a Slack notification. Also, set the reclaim policy to "Delete" for non-production environments so volumes get cleaned up automatically.

Adding It All Up

On a recent engagement, a client was spending $38,000 per month on their Kubernetes infrastructure across three clusters (prod, staging, dev). After addressing all five issues above, their monthly spend dropped to $21,000. That is a 45% reduction, or over $200,000 in annual savings. The entire optimization took about three weeks of engineering time.

The underlying principle is the same across all five fixes: do not pay for resources you are not using. Kubernetes makes it easy to over-provision because the abstractions hide the cost. Make cost visible, measure actual usage, and right-size everything.

Mistake 1: Over-Provisioned Resource Requests

Mistake 2: No Horizontal Pod Autoscaler

Mistake 3: Always-On Dev and Staging Clusters

Mistake 4: Not Using Spot Nodes

Mistake 5: Orphaned Persistent Volumes

Adding It All Up

More from QuikSync

What We Learned Shipping GenAI to Production

The Cloud Cost Playbook: How We Cut AWS Bills by 28%