apache-spark databricks databricks-community-edition

Databricks code does not work anymore with 'directory not found' error

This is from an SO-question some 4 years ago. It worked in Databricks Notebook.

%python
import pandas as pd
from io import StringIO

data = """
CODE,L,PS
5d8A,N,P60490
5d8b,H,P80377
5d8C,O,P60491
"""

df = pd.read_csv(StringIO(data), sep=',')
#print(df)
df.to_csv('/dbfs/FileStore/NJ/file1.txt')

pandas_df = pd.read_csv("/dbfs/FileStore/NJ/file1.txt", header='infer') 
print(pandas_df)

Now it does not. Error message is:

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/NJ/file1.txt'

Curious what the issue is. As the directory is there.

Solution

Starting from DBR 7.x, the /dbfs/ (so-called fuse mount) is disabled on the Databricks community edition due to some reasons (maybe for security, but I don't remember).

The current workaround is to use dbutils.fs.cp (or dbutils.fs.mv) to copy files between DBFS & local file system, and working with files locally. (if URI scheme isn't specified, then it's DBFS by default).

To copy from DBFS to local file system:

dbutils.fs.cp("/path-on-dbfs", "file:/local-path")

To copy from local file system to DBFS:

dbutils.fs.cp("file:/local-path", "/path-on-dbfs")