Search code examples
pythondatabricksmkdocsaws-databricks

Convert Databricks notebook to .py file in workspace


The actual problem I'm trying to solve is that I'm using mkdocs/mkdocs-materials for my documentation. But that tool can't work with notebook type files.

So as a clumsy workaround I'm figuring is to have an intermediate step that creates a copy of the notebook content as a .py file, in the same workspace folder. Have mkdocs build off of those copies. Then delete the copies before pushing.

For example I've got a notebook type object in my workspace. Display looks like this:

%sql
select * from something

%sql
select * from something_else

def some_dummy_function():
    print('dummy')

When you export a notebook as a source python file via the GUI, you get this with all the tagging for syntax.

# Databricks notebook source
# MAGIC %sql 
# MAGIC select * from something
# COMMAND ----------

# MAGIC %sql
# MAGIC select * from something_else

def some_dummy_function():
    print('dummy')

I want to get this programmatically, from a notebook in a workspace.

Or if you've got suggestions for the root problem at hand ... all ears.


Solution

  • Cobbling together using this as a reference, mainly for the base64 decode idea:

    String search in all Databricks notebook in workspace level

    and this handy package: https://pypi.org/project/databricks-api/

    pip install databricks-api
    
    from databricks_api import DatabricksAPI
    import base64
    
    notebook_context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
    databricks_api_instance = DatabricksAPI(
          host=notebook_context.apiUrl().getOrElse(None),
          token=notebook_context.apiToken().getOrElse(None)
      )
    
    response = databricks_api_instance.workspace.export_workspace(
        f"/Repos/me@my_company.com/my_repo/my_notebook",
        format="SOURCE",
        direct_download=None,
        headers=None,
    )
    
    notebook_content = base64.b64decode(response['content']).decode("utf-8")
    
    with open("/Workspace/Repos/me@my_company.com/my_repo/new_file_name.py","w") as f:
        f.write(notebook_content)