Search code examples
google-cloud-platformterraformterraform-provider-gcpgoogle-cloud-vertex-ai

Attempting to create Vertex AI Managed Notebook fails with error: Failed to insert a GCE VM


I’m trying to create a number of vertex AI managed notebooks in GCP (note, not user managed notebooks, but managed notebooks). Each one fails with the same error:

2023-07-10T02:36:05.3813609Z e[31m│e[0m e[0me[1me[31mError: e[0me[0me[1mError waiting to create Runtime: Error waiting for Creating Runtime: Error code 3, message: operation “projects//locations/australia-southeast1/operations/create-d73db12f-8b7e-48eb-af84-7aadae580bd1” completed with error: %!w(*status.Status=&{{{}   } 3 Http(400) Bad Request; [ExecuteComputeApi] failed.; [VM][:vm-d73db12f-8b7e-48eb-af84-7aadae580bd1] ; Failed to insert a GCE VM   131})e[0m

2023-07-10T02:36:05.3814103Z e[31m│e[0m e[0m

2023-07-10T02:36:05.3814380Z e[31m│e[0m e[0me[0m with module.user2notebook.google_notebooks_runtime.runtime,

2023-07-10T02:36:05.3814746Z e[31m│e[0m e[0m on notebook_module/notebook_module.tf line 16, in resource “google_notebooks_runtime” “runtime”:

2023-07-10T02:36:05.3815095Z e[31m│e[0m e[0m 16: resource “google_notebooks_runtime” “runtime” e[4m{e[0me[0m

2023-07-10T02:36:05.3815319Z e[31m│e[0m e[0m
  • I have ruled out permissions, as we have elevated the service account being used to manage the deployment all the way up to OWNER of the project.

  • All the required APIs are running in the GCP project, as we can manually create a managed notebook via the console UI in Vertex AI without issue (same settings).

  • I have ruled out issues with the network and subnet being shared with this project from the shared VPC project (again, because the manual creation works fine).

Google Support have said it’s a Terraform issue, so here I am unfortunately.

Here is the module I created to stand up each notebook:

variable “user_name” {
description = “Name of the user”
type = string
}

variable “user_email” {
description = “Email of the user”
type = string
}

variable “machine_type” {
description = “Machine type for the notebook instance”
type = string
}

resource “google_notebooks_runtime” “runtime” {
name = “${lower(replace(replace(var.user_email, “@”, “-”), “.”, “-”))}-notebook-instance”
location = “australia-southeast1”
access_config {
access_type = “SINGLE_USER”
runtime_owner = var.user_email
}
virtual_machine {
virtual_machine_config {
machine_type = var.machine_type
internal_ip_only = true
network = var.NETWORK
subnet = var.SUBNET
data_disk {
initialize_params {
disk_size_gb = “100”
disk_type = “PD_STANDARD”
}
}

  metadata = {
    app    = "inventory-mlops"
    bu     = var.BU
    owner  = var.OWNER
    costcentre = var.COSTCENTRE
    email  = var.user_email
  }
  labels = {
    app    = "inventory-mlops"
    bu     = var.BU
    owner  = var.OWNER
    costcentre = var.COSTCENTRE
    email  = var.user_email
  }
}
}
timeouts {
create = “30m”
delete = “30m”
}
}

And calling it from main.tf

module "user21notebook" {
  source = "./notebook_module"
  
  user_name = "James Matson"
  user_email = "james.matson@api.net.au"
  machine_type = "n1-standard-1"
}

It seems to me to be set up correctly according to this reference (the only reliable one I can find online for Vertex AI managed notebooks)

https://github.com/terraform-google-modules/terraform-docs-samples/blob/main/vertex_ai/managed_notebooks_runtime/main.tf


Solution

  • Believe it or not, this issue was resolved by removing the labels and metadata blocks entirely, as I was told - despite the documentation in Hashicorp listing them as valid blocks - they are not supported for managed notebooks. Removed, re-ran, deployed fine.