Problem
At my current job we have chosen to separate our concerns by running many small GKE clusters. This means we run a lot of clusters.
As a cost cutting measure we spin down our dev and test clusters each weekday night for a period of about 8 hours, and over the weekend we leave them shutdown altogether. This allows our teams in multiple timezones to have access to the clusters when they need them, but saves the compute running cost for the nodes in the node pools when they are not needed. In total this saves 88 out of 168 hours a week of run time, or about 52% of the nodepool cost.
Solution
I’ve written a simple GCP function called gke-cluster-nodepool-scaler that can be used in conjunction with the GCP Scheduler and a PubSub topic to scale cluster nodepools up and down.
Deployment
Example Terraform module:
variable company_name {}
variable project_id {}
variable zone {}
variable cluster {}
variable nodepool {}
variable app_version {}
variable min_nodes {}
variable max_nodes {}
resource "google_cloudfunctions_function" "gke-cluster-nodepool-scaler" {
name = "gke-cluster-nodepool-scaler"
source_archive_bucket = google_storage_bucket.gke-cluster-nodepool-scaler.name
source_archive_object = "app-${local.app_version}.zip"
available_memory_mb = 128
timeout = 60
runtime = "python37"
entry_point = "main"
event_trigger {
event_type = "google.pubsub.topic.publish"
resource = "projects/${local.project_id}/topics/gke-cluster-nodepool-scaler"
}
environment_variables = {
PROJECT_ID = var.project_id
ZONE = var.zone
CLUSTER = var.cluster
NODEPOOL = var.nodepool
}
}
resource "google_storage_bucket" "gke-cluster-nodepool-scaler" {
name = "${var.company_name}-gke-cluster-nodepool-scaler"
}
resource "google_pubsub_topic" "gke-cluster-nodepool-scaler" {
name = "gke-cluster-nodepool-scaler"
}
# scale the cluster down every weekday night
resource "google_cloud_scheduler_job" "gke-cluster-nodepool-scaler-scale-down" {
name = "gke-cluster-nodepool-scaler-scale-down"
schedule = "0 0 * * 2,3,4,5,6"
pubsub_target {
topic_name = google_pubsub_topic.gke-cluster-nodepool-scaler.id
data = base64encode(jsonencode({ "nodes" = var.min_nodes }))
}
}
# scale the cluster up every weekday morning
resource "google_cloud_scheduler_job" "gke-cluster-nodepool-scaler-scale-up" {
name = "gke-cluster-nodepool-scaler-scale-up"
schedule = "0 8 * * 1,2,3,4,5"
pubsub_target {
topic_name = google_pubsub_topic.gke-cluster-nodepool-scaler.id
data = base64encode(jsonencode({ "nodes" = var.max_nodes }))
}
}
Note: A copy of the manifest can be found here.
Caveats
If Terraform is run during the out-of-hours time period where the cluster has been scaled down to zero, it will attempt to change the nodepool state back to whatever it was provisioned with. This is potentially desired behaviour since for us it means a terraform change has been pushed to our CD systems which we probably want to test. In either case, this scaling process can co-exist harmoniously with Terraform in that both of them are able to adjust the same nodepools without causing conflict.
I also found when auto-scaling was enabled that sometimes the cluster size could end up above what I was expecting when triggering a resize. I believe this is due to the cluster auto-scaler. In my case auto-scaling is unnecessary in our dev and test environments so I’ve set the max nodes to 1 for each zone to prevent unwanted node creation.
Testing
There are several ways which the service can be tested.
It’s possible to directly execute the GCP function:
Once the function works, test publishing a message onto the pubsub queue:
Then test the scheduler publishing is functioning:
Finally, check the function log to see if the function executed at the correct times:
Circumventing the Schedule
If a developer wants to work out of hours and needs to circumvent the usual triggers, they can trigger the scale-up function to scale up each node pool:
cat >> ~/bin/gke_scale_nodepools.sh << EOF
#!/usr/bin/env bash
NODE_COUNT="${1}"
PROJECTS="${2}"
TOPIC="gke-cluster-nodepool-scaler"
set -euo pipefail
if [[ $# != 2 ]]; then
echo "usage: $0 <node count> \"<project> ...\""
exit 1
fi
for project in ${PROJECTS}; do
gcloud --project "${PROJECT}" pubsub topics publish "${TOPIC}" --message "{\"nodes\":${NODE_COUNT}}"
done
EOF
chmod +x ~/bin/gke_scale_nodepools.sh
# scale nodepool up
~/bin/gke_scale_nodepools.sh 1 project0
Equally, it’s possible to manually trigger a scale down for a set of clusters across multiple projects:
# scale nodepool down
~/bin/gke_scale_nodepools.sh 0 "project0 project1 project2 project3"
Conclusion
Along with several other cost cutting measures such as using preemptive instances, using two-node clusters (to enable testing concurrency, but to keep the number of cluster nodes to a minimum), and by using the smallest possible instances for our apps, it has been possible to significantly reduce the running cost of our non-production environments.