Search code examples
databricksazure-databricks

Databricks relative paths, Git and Workspace sources, and library functions


We tend to use notebooks for library functions and "import" them using this pattern:

%run ../../common/email_functions

In this email_functions notebook (python, btw), we might have a function such as:

def send_email(params):
    dbutils.notebook.run('../dispatch_email', 0, {"p_from":p_from, "p_to":"", "p_subject":p_subject, "p_content":content_str, "status":process_status})

(that's not the actual function, but simply an illustration of existing code -- this is years of incremental development on a few core principles that we're trying to unwind)

When we execute dbutils.notebook.run, I'm assuming that "relative path" is from the current working directory, not the location of the file making the run call.

I'm concerned that this relative path works fine for the master node, but may not work as expected with the executor nodes. Does anybody have a pattern they use to normalize the current folder path somehow?

Any assistance is greatly appreciated.

Here's some code I've tried, but I'm still getting occasional "NotebookNotFound" errors that I haven't been able to isolate yet:

    cur_path = os.path.dirname(
        dbutils.notebook.entry_point.getDbutils()
            .notebook()
            .getContext()
            .notebookPath()
            .get()
    )
    os.chdir(f"/Workspace/{cur_path}")

I'm concerned this isn't really a universal solution.


Solution

  • Yeah, from where you are calling the notebook it takes that path for further to call another notebook.

    To avoid this you need to pass the absolute path from where you are calling. When you calling this, here itself you need to calculate the absolute paths of notebook you call inside email_functions.

    %run ../../common/email_functions
    

    Inside email_functions you have

    def send_email(params):
        dbutils.notebook.run('../dispatch_email', 0, {"p_from":p_from, "p_to":"", "p_subject":p_subject, "p_content":content_str, "status":process_status})
    

    Instead of ../dispatch_email you need to give path passed from calling notebook.

    %run ../../common/email_functions $path="path'
    

    Below is the code to get the absolute path dynamically.

    import os
    tp = os.walk("/Workspace/Users/<your_mail.com>/")
    required_paths = ['dispatch_email']
    paths={}
    for i in required_paths:
        for root, dirs, files in tp:
            if i in files:
                paths[i] = root.replace("/Workspace",'')+i
        print(paths[i])
    

    and In email_functions.

    p = dbutils.widgets.get("path")
    print(p)
    
    def send_email():
        return dbutils.notebook.run(p,0)
    

    Since %run takes literal argument you can't send variable/parameter to %run.

    enter image description here

    So you need to use dbutils.

    Output:

    enter image description here

    So, you can use above code further in nested notebook call to generate paths given notebook names in the list.