So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure:
resource "azurerm_databricks_workspace" "bdcc" {
depends_on = [
azurerm_resource_group.bdcc
]
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
And I also found here
That by using this I can even create particular notebook in this Azure DataBricks infrastructure:
resource "databricks_notebook" "notebook" {
content_base64 = base64encode(<<-EOT
# created from ${abspath(path.module)}
display(spark.range(10))
EOT
)
path = "/Shared/Demo"
language = "PYTHON"
}
But since I am new to this, I am not sure in what order I should put those pieces of code together.
It would be nice if someone could point me to the full example of how to create notebook via terraform on Azure Databricks.
Thank you beforehand!
In general you can put these objects in any order - it's a job of the Terraform to detect dependencies between the objects and create/update them in the correct order. For example, you don't need to have depends_on
in the azurerm_databricks_workspace
resource, because Terraform will find that it needs resource group before workspace could be created, so workspace creation will follow the creation of the resource group. And Terraform is trying to make the changes in the parallel if it's possible.
But because of this, it's becoming slightly more complex when you have workspace resource together with workspace objects, like, notebooks, clusters, etc. As there is no explicit dependency, Terraform will try create notebook in parallel with creation of workspace, and it will fail because workspace doesn't exist - usually you will get a message about authentication error.
The solution for that would be to have explicit dependency between notebook & workspace, plus you need to configure authentication of Databricks provider to point to newly created workspace (there are differences between user & service principal authentication - you can find more information in the docs). At the end your code would look like this:
resource "azurerm_databricks_workspace" "bdcc" {
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
provider "databricks" {
host = azurerm_databricks_workspace.bdcc.workspace_url
}
resource "databricks_notebook" "notebook" {
depends_on = [azurerm_databricks_workspace.bdcc]
...
}
Unfortunately, there is no way to put depends_on
on the provider level, so you will need to put it into every Databricks resource that is created together with workspace. Usually the best practice is to have a separate module for workspace creation & separate module for objects inside Databricks workspace.
P.S. I would recommend to read some book or documentation on Terraform. For example, Terraform: Up & Running is very good intro