Search code examples
azureazure-sdk-python

How to use the Azure Python SDK to provision a Databricks service?


[Previously in this post I asked how to provision a databricks services without any workspace. Now I'm asking how to provision a service with a workspace as the first scenario seems unfeasible.]

As a cloud admin I'm asked to write a script using the Azure Python SDK which will provision a Databricks service for one of our big data dev teams.

I can't find much online about Databricks within the Azure Python SDK other than https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.operations.html

and

https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.html

These appear to offer some help provisioning a workspace, but I am not quite there yet.

What am I missing?

EDITS:

Thanks to @Laurent Mazuel and @Jim Xu for their help.

Here's the code I'm running now, and the error I'm receiving:

client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
WorkspacesOperations.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name",
custom_headers=None,
raw=False,
polling=True
)

error:

TypeError: create_or_update() missing 1 required positional argument: 'workspace_name'

I'm a bit puzzled by that error as I've provided the workspace name as the third parameter, and according to this documentation, that's just what this method requires.

I also tried the following code:

client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
client.workspaces.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name"
)

Which results in:

 Traceback (most recent call last):
   File "./build_azure_visibility_core.py", line 112, in <module>
     ca_databricks.create_or_update_databricks(SUB_PREFIX)
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/expd_az_databricks.py", line 34, in create_or_update_databricks
     self.databricks_workspace_name
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 264, in create_or_update
     **operation_config
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 210, in _create_or_update_initial
     body_content = self._serialize.body(parameters, 'Workspace')
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/msrest/serialization.py", line 589, in body
     raise ValidationError("required", "body", True)
 msrest.exceptions.ValidationError: Parameter 'body' can not be None.
 ERROR: Job failed: exit status 1

So Line 589 in serialization.py has an error. I don't see where an error in my code is causing that. Thanks to all who have been generous to assist!


Solution

  • with help from @Laurent Mazuel and support engineers at Microsoft, I have a solution:

    managed_resource_group_ID = ("/subscriptions/"+sub_id+"/resourceGroups/"+managed_rg_name)
    client = DatabricksClient(credentials, subscription_id)
    workspace_obj = client.workspaces.get(rg_name, databricks_workspace_name)
    client.workspaces.create_or_update(
        {
            "managedResourceGroupId": managed_resource_group_ID,
            "sku": {"name":"premium"},
            "location":location
        },
        rg_name,
        databricks_workspace_name
    ).wait()