Search code examples
terraformgoogle-kubernetes-enginegoogle-cloud-dataprocterraform-provider-gcpgoogle-cloud-dataproc-serverless

Dataproc on GKE via Terraform not working (example provided by Terraform doc)


I was doing some test in my GCP project to verify if i can migrate Dataproc on GKE and keep it up and running, while leveraging on auto scaling for workloads. However, i'm blocked since teh beginning.

Picking the example from the doc, placed together and i get this error message

╷
│ Error: Unsupported block type
│
│   on ../../modules/poc/main.tf line 46, in resource "google_dataproc_cluster" "dataproc_gke_cluster":
│   46:   virtual_cluster_config {
│
│ Blocks of type "virtual_cluster_config" are not expected here.
╵
ERRO[0003] Terraform invocation failed in 
ERRO[0003] 1 error occurred:
        * exit status 1

From the doc, virtual_cluster_config is an expected block inside google_dataproc_cluster resource

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_cluster#nested_virtual_cluster_config

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster

Here my full code:


resource "google_container_cluster" "poc_primary_gke" {
  name     = "poc-gke-cluster"
  location = "europe-west1"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = "my-node-pool"
  location   = "europe-west1"
  cluster    = google_container_cluster.poc_primary_gke.name
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = var.service_account_email
    oauth_scopes    = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

resource "google_storage_bucket" "staging_bucket" {
  name          = "staging-bucket-poc"
  location      = "EU"
  force_destroy = true

  uniform_bucket_level_access = true
}

resource "google_dataproc_cluster" "dataproc_gke_cluster" {
  name     = "gke-dataproc-poc"
  region   = "europe-west1"
  graceful_decommission_timeout = "120s"

  labels = var.labels

  virtual_cluster_config {
    staging_bucket = google_storage_bucket.staging_bucket.name
    kubernetes_cluster_config {
        kubernetes_namespace = "foobar"

        kubernetes_software_config {
          component_version = {
            "SPARK" : "3.1-dataproc-7"
          }

          properties = {
            "spark:spark.eventLog.enabled": "true"
          }
        }

        gke_cluster_config {
          gke_cluster_target = google_container_cluster.primary.id

          node_pool_target {
            node_pool = "dpgke"
            roles = ["DEFAULT"]

            node_pool_config {
              autoscaling {
                min_node_count = 1
                max_node_count = 6
              }

              config {
                machine_type      = "n1-standard-4"
                preemptible       = true
                local_ssd_count   = 1
                min_cpu_platform  = "Intel Sandy Bridge"
              }
            }
          }
        }
      }
  }
}

Does any one successfully created a Dataproc Cluster on GKE via Terraform?


Solution

  • I finally solved the issue.

    Apparently, the problem is the terraform google provider version. I thought I had the latest but I wasn't right.

    It seems not available for versions <4.39, so I currently upgraded to 4.44. Everything is good now.