Search code examples
pythonpysparkdatabricksdatabricks-community-edition

Accessing file from python code in databricks


I am trying to access a model file I had previously copied over via CLI by using the following code in a notebook at https://community.cloud.databricks.com/

with open("/dbfs/cat_encoder.joblib", "rb") as f:
    lb_category = joblib.load(f)

For this I get

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/cat_encoder.joblib'

I had copied the file using the CLI as I said by running:

dbfs cp cat_encoder.joblib dbfs:/cat_encoder.joblib

Then doing

databricks fs ls "dbfs:/"

I see the file which I had copied.

But if I were to do this in my notebook:

os.chdir('/dbfs')
print(os.listdir('.'))

I see an empty directory instead of the folder and files I see if I were using the UI or the CLI.

If I were to do a write into this empty directory from the notebook, yes that works and I would see exactly one file in that directory the file I had just written, the problem is I want to read what I had already put there beforehand.

It looks as if the local api cannot see what the proverbial other hand is doing with all the datasets and models I have loaded either by CLI or by UI. So why can I not see these files? Does it have something to do with credentials, and if so how do I resolve that? Or is likely something entirely else like maybe mounting? I am doing an introductory trial and some basic stuff on my own to learn databricks, so I am not too familiar with the underlying concepts.


Solution

  • This is a behavior change in the Databricks Runtime 7.x on the Community Edition (and only there) - the dbfs:/ files aren't available anymore via /dbfs/.... If you want to access that DBFS file locally then you can use dbutils.fs.cp('dbfs:/file', 'file:/local-path') (or %fs cp ...) to copy file from DBFS to local file system where you can work with it.