The other day I made a silly mistake when modifying some IAM policies on our terraform environment, I applied a change where I wanted to ADD a policy, when in fact it took this as the ONLY policy to exist so wiped out some of the vital IAM policies for service accounts to run GKE etc. Not my best day to say the least (an lesson learnt!).
Everything has been put back to normal manually for now, as the service account permissions were never set via TF anyway - they're the sort of permissions that are applied when enabling APIs on GCP so its done by them in the background. Our GKE cluster can now be managed again and can autoscale etc.
However, now when I run our terraform plan I receive a 500 error on a resource that was never previously a problem (redacted sensivite info):
2021-09-09T18:47:50.794Z [INFO] provider.terraform-provider-google-beta_v3.60.0_x5: 2021/09/09 18:47:50 [DEBUG] Retry Transport: Finished waiting 4s before next retry: timestamp=2021-09-09T18:47:50.794Z
2021-09-09T18:47:50.794Z [INFO] provider.terraform-provider-google-beta_v3.60.0_x5: 2021/09/09 18:47:50 [DEBUG] Retry Transport: request attempt 5: timestamp=2021-09-09T18:47:50.794Z
2021-09-09T18:47:50.794Z [INFO] provider.terraform-provider-google-beta_v3.60.0_x5: 2021/09/09 18:47:50 [DEBUG] Google API Request Details:
---[ REQUEST ]---------------------------------------
GET /v1/services/servicenetworking.googleapis.com/connections?alt=json&network=projects%2F411211291013%2Fglobal%2Fnetworks%2F**********&prettyPrint=false HTTP/1.1
Host: servicenetworking.googleapis.com
User-Agent: google-api-go-client/0.5 Terraform/1.0.6 (+https://www.terraform.io) Terraform-Plugin-SDK/2.4.4 terraform-provider-google-beta/dev
X-Goog-Api-Client: gl-go/1.14.5 gdcl/20210211
Accept-Encoding: gzip
-----------------------------------------------------: timestamp=2021-09-09T18:47:50.794Z
2021-09-09T18:47:51.601Z [INFO] provider.terraform-provider-google-beta_v3.60.0_x5: 2021/09/09 18:47:51 [DEBUG] Google API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 500 Internal Server Error
Cache-Control: private
Content-Type: application/json; charset=UTF-8
Date: Thu, 09 Sep 2021 18:47:51 GMT
Server: ESF
Vary: Origin
Vary: X-Origin
Vary: Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0
{
"error": {
"code": 500,
"message": "An internal exception occurred.,
"errors": [
{
"message": "An internal exception occurred.\nHelp Token: Ae-hA1PlQyCLBCgXD3Lle******************************************vhHU8zy1z9h",
"domain": "global",
"reason": "backendError"
}
],
"status": "INTERNAL"
}
}
│ Error: googleapi: Error 500: An internal exception occurred.
│ Help Token: Ae-hA1ONdq************************************m0k, backendError
│
│ with google_service_networking_connection.private_vpc_connection,
│ on vpc.tf line 81, in resource "google_service_networking_connection" "private_vpc_connection":
│ 81: resource "google_service_networking_connection" "private_vpc_connection"
Has anyone had similar happen before? Things I've tried so far:
Update: This was due to a missing permission on the servicenetworking API. The default service account created needed roles/servicenetworking.serviceAgent
permission again after it had been wiped.
More details here