azure terraform databricks azure-databricks terraform-provider-databricks

How can I fix terraform databricks_global_init_script getting me an HTTP 503 with an error about azure container credentials on first deploy?

I'm deploying a databricks workspace on Azure using terraform, which includes a global init script. Nothing fancy - the workspace is accessible via a private endpoint, and the init script depends_on both the workspace and the private endpoint. The whole process is authenticated with an existing service principal.

When I terraform apply from scratch, I get the following result from the init script creation:

<log preamble> POST /api/2.0/global-init-scripts
> {
>   "enabled": true,
>   "name": "the-init-script.sh",
>   "position": 0,
>   "script": "some base 64... (1234 more bytes)"
> }
< HTTP/2.0 503 Service Unavailable
< {
<   "error_code": "TEMPORARILY_UNAVAILABLE",
<   "message": "Missing credentials to access Azure container"
< }: timestamp=<a timestamp>

This persists for a few minutes before I can get an error-free terraform apply from the existing state.

As far as google is concerned, this is the first time anyone has ever posted this error message on the public internet, and I can't find anything related that seems to point to a similar problem.

It's clearly some sort of propagation issue, given that it just goes away after a couple minutes, but I'd love to figure out if there's something else that I can depend on that will make this plan apply cleanly

Solution

This was already reported in the github issues of Databricks Terraform provider - in some cases, things aren't propagated fast enough.

The reporter posted that they were able to solve the problem & get clean apply by adding a delay between workspace creation & adding a global init script - their answer:

We have solved it by adding a time_sleep with a depends_on the workspace id using on_create 30s timeout. The global init script waits 30 second with depends_on this time_sleep.