Search code examples
google-cloud-platformterraformterraform-provider-gcp

`Error 403: Insufficient regional quota to satisfy request: resource "SSD_TOTAL_GB"` when creating kubernetes cluster with terraform


Hi I am playing around with kubernetes and terraform in a google cloud free tier account (trying to use the free 300$). Here is my terraform resource declaration, it is something very standard I copied from the terraform resource page. Nothing particularly strange here.

resource "google_container_cluster" "cluster" {
  name = "${var.cluster-name}-${terraform.workspace}"
  location = var.region
  initial_node_count = 1
  project = var.project-id
  remove_default_node_pool = true
}

resource "google_container_node_pool" "cluster_node_pool" {
  name       = "${var.cluster-name}-${terraform.workspace}-node-pool"
  location   = var.region
  cluster    = google_container_cluster.cluster.name
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"
    service_account = google_service_account.default.email
    oauth_scopes    = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

This terraform snippet used to work fine. In order to not burn through the 300$ too quickly, at the end of every day I used to destroy the cluster with terraform destroy. However one day the kubernetes cluster creation just stopped working. Here is the error:

Error: googleapi: Error 403: Insufficient regional quota to satisfy request: resource "SSD_TOTAL_GB": request requires '300.0' and is short '50.0'. project has a quota of '250.0' with '250.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=xxxxxx., forbidden

It looks like something didn't get cleaned up after all the terraform destroy and eventually some quota built up and I am not able to create a cluster anymore. I am still able to create a cluster through the google cloud web interface (I tried only with autopilot, and in the same location). I am a bit puzzled why this is happening. Is my assumption correct? Do I need to delete something that doesn't get deleted automatically with terraform? if yes why? Is there a way to fix this and be able to create the cluster with terraform again?


Solution

  • I ran into the same issue and I think I figured out what's going. The crucial thing here is to understand the difference between zonal and regional clusters.

    tldr; A zonal cluster operates in only zone, where a regional cluster may be replicated across multiple zones.

    From the doc,

    By default, GKE replicates each node pool across three zones of the control plane's region

    I think this is why we're seeing the requirement going to 300GB (3 * 100GB), where the --disk-size defaults to 100GB.

    The solution is to set the location to a zone than a region. Of course, here I'm assuming a zonal cluster would satisfy your requirements. E.g.

    resource "google_container_cluster" "cluster" {
      name = "${var.cluster-name}-${terraform.workspace}"
      location = "us-central1-f"
      initial_node_count = 1
      project = var.project-id
      remove_default_node_pool = true
    }
    
    resource "google_container_node_pool" "cluster_node_pool" {
      name       = "${var.cluster-name}-${terraform.workspace}-node-pool"
      location   = "us-central1-f"
      cluster    = google_container_cluster.cluster.name
      node_count = 1
    
      node_config {
        ...
      }
    }