Search code examples
google-cloud-platformterraformterraform-provider-gcp

Terraform & GCP: Google kubernetes cluster problem: Can't see monitoring section (memory and cpu) inside workloads (deployments, statefulsets)


I spent 4 days already testing all configurations from kubernetes terraform gcp module and I can't see the metrics of my workloads, It never shows me CPU nor Memory (and even the standard default created kubernetes in the GUI has this activated.

Here's my code:

resource "google_container_cluster" "default" {
  provider = google-beta
  name        = var.name
  project     = var.project_id
  description = "Vectux GKE Cluster"
  location    = var.zonal_region
  remove_default_node_pool = true
  initial_node_count       = var.gke_num_nodes
  master_auth {
    #username = ""
    #password = ""
    client_certificate_config {
      issue_client_certificate = false
    }
  }
  timeouts {
    create = "30m"
    update = "40m"
  }
  logging_config {
    enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
  }
  monitoring_config {
    enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
  }
}

resource "google_container_node_pool" "default" {
  name       = "${var.name}-node-pool"
  project    = var.project_id
  location   = var.zonal_region
  node_locations = [var.zonal_region]
  cluster    = google_container_cluster.default.name
  node_count = var.gke_num_nodes
 
  node_config {
    preemptible  = true
    machine_type = var.machine_type
    disk_size_gb = var.disk_size_gb
    service_account = google_service_account.default3.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/cloud-platform",
      "compute-ro",
      "storage-ro",
      "service-management",
      "service-control",
    ]
    metadata = {
      disable-legacy-endpoints = "true"
    }
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }
}


resource "google_service_account" "default3" {
  project      = var.project_id
  account_id   = "terraform-vectux-33"
  display_name = "tfvectux2"
  provider     = google-beta
}

Here's some info on the cluster (when I compare against the standard one with the metrics enabled I see no differences: enter image description here

And here 's the workload view without the metrics that I'd like to see: enter image description here


Solution

  • As I mentioned in the comment to solve your issue, you must add google_service_account_iam_binding module and grant your Service Account specific role - roles/monitoring.metricWriter. In comments I mention that you can also grant role/compute.admin but after another test I've run it's not necessary.

    Below is a terraform snippet I've used to create a test cluster with Service Account called sa. I've changed some fields in node config. In your case, you would need to add the whole google_project_iam_binding module.

    Terraform Snippet

    ### Creating Service Account
    resource "google_service_account" "sa" {
      project      = "my-project-name"
      account_id   = "terraform-vectux-2"
      display_name = "tfvectux2"
      provider     = google-beta
    }
    ### Binding Service Account with IAM
    resource "google_project_iam_binding" "sa_binding_writer" {
      project = "my-project-name"
      role    = "roles/monitoring.metricWriter"
      members = [
        "serviceAccount:${google_service_account.sa.email}" 
        ### in your case it will be "serviceAccount:${google_service_account.your-serviceaccount-name.email}"
      ]
    }
    
    resource "google_container_cluster" "default" {
      provider = google-beta
      name        = "cluster-test-custom-sa"
      project     = "my-project-name"
      description = "Vectux GKE Cluster"
      location    = "europe-west2"
      remove_default_node_pool = true
      initial_node_count       = "1"
      master_auth {
        #username = ""
        #password = ""
        client_certificate_config {
          issue_client_certificate = false
        }
      }
      timeouts {
        create = "30m"
        update = "40m"
      }
      logging_config {
        enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
      }
      monitoring_config {
        enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
      }
    }
    
    resource "google_container_node_pool" "default" {
      name       = "test-node-pool"
      project    = "my-project-name"
      location   = "europe-west2"
      node_locations = ["europe-west2-a"]
      cluster    = google_container_cluster.default.name
      node_count = "1"
    
      node_config {
        preemptible  = "true"
        machine_type = "e2-medium"
        disk_size_gb = 50
        service_account = google_service_account.sa.email
        ###service_account = google_service_account.your-serviceaccount-name.email
        oauth_scopes = [
          "https://www.googleapis.com/auth/logging.write",
          "https://www.googleapis.com/auth/monitoring",
          "https://www.googleapis.com/auth/cloud-platform",
          "compute-ro",
          "storage-ro",
          "service-management",
          "service-control",
        ]
        metadata = {
          disable-legacy-endpoints = "true"
        }
      }
    
      management {
        auto_repair  = true
        auto_upgrade = true
      }
    }
    

    My Screens:

    Whole workload

    Node Workload

    Additional Information

    If you would add just roles/compute.admin you might see workload for the whole application,but you wouldn't be able to see each node workload. With "roles/monitoring.metricWriter" you are able to see the whole application workload and each node workload. To achieve what you want - see workloads in the node, you just need "roles/monitoring.metricWriter".

    You need to use "google_project_iam_binding" as without this in IAM roles, you won't have your newly created Service Account and it will lack permission. In short, Your new SA will be visible in IAM & Admin > Service Accounts but there will be no entry in IAM & Admin > IAM.

    If you want more information about IAM and Binding in terraform, please check this Terraform Documentation

    As a last thing, please remember that Oauth Scope with "https://www.googleapis.com/auth/cloud-platform" gives access to all GCP resources.