Search code examples
databricksazure-databricks

What is the correct way to access a workspace file in databricks


My databricks runtime version is 10.4 LTS. I am trying to access a workspace file using open() method from python. I tried with multiple different ways, but they all failed.

Suppose my workspace file path is /Workspace/Users/<user-email>/my_result.json, I tried the following paths:

  • without any prefix, /Workspace/Users/<user-email>/my_result.json: failed
  • adding /dbfs, /dbfs/Workspace/Users/<user-email>/my_result.json: failed
  • adding /file, /file/Workspace/Users/<user-email>/my_result.json: failed

Is my path format wrong, or it is just we can't access files under workspace using the open() method from Python?


Solution

  • According to these documentations (1, 2), the workspace files or assets are available for Databricks Runtime 11.2 and above.

    With Databricks Runtime 11.2 and above, you can create and manage source code files in the Azure Databricks workspace, and then import these files into your notebooks as needed.

    Using the path without a prefix is the correct method. It works fine in Runtime 11.2 and above.

    file_path = "/Workspace/Users/xxx@yyy.com/outdata.json"
    
    try:
        with open(file_path, 'r') as file:
            content = file.read()
            print(content)
    except Exception as e:
        print(f"Error: {e}")
    

    Output:

    enter image description here

    Therefore, you need to use Databricks Runtime 11.2 and above, or upload your .json file into dbfs. When opening the file, prefix the path with /dbfs/path_to_file/.