Search code examples
azure-databricksroot

SYS PATH in Databricks


I have 100 notebooks that are using some common modules, to import the modules currently use the following code snippet

import sys
sys.path.append('/Workspace/<folder>')
import <module>

note that "module" is in the "folder"

once the cluster is up, notebooks are working fine which has the above code snippet. it's very inefficient to add the code to each notebook which is running independently. I tried adding the global init script following the code

#!/bin/bash
export PYTHONPATH=${PYTHONPATH}:/Workspace/<folder>

it's not working in that way.


Solution

  • According to this documentation, the only way to use workspace files is to append them to the system path because any settings for the path environment variable PYTHONPATH will be overwritten. You can see this in the cluster logs of init scripts.

    A possible solution is to use the common function initial itself, as @chen mentioned, or create a notebook with the following code and run it in each of your notebooks using %run:

    import sys
    
    def load_modules():
        sys.path.append("/Workspace/.../modules/")
        print(sys.path)
    

    Run the notebook like this in each of your notebooks:

    %run /Users/.../modules_load_ntbk
    

    Make sure you provide the correct path to your notebook while running.

    Load the modules:

    load_modules()
    from module1 import tmp
    tmp.hw()
    

    Output:

    enter image description here