Usually on Databricks on Azure/AWS, to read files stored on Azure Blob/S3, I would mount the bucket or blob storage and then do the following:
If using Spark
df = spark.read.format('csv').load('/mnt/my_bucket/my_file.csv', header="true")
If using directly pandas, adding /dbfs to the path:
df = pd.read_csv('/dbfs/mnt/my_bucket/my_file.csv')
I am trying to do the exact same thing on the hosted version of Databricks with GCP and though I successfully manage to mount my bucket and read it with Spark, I am not able to do it with Pandas directly, adding the /dbfs does not work and I get a No such file or directory: ... error
Has any one of you encountered a similar issue ? Am I missing something ?
Also when I do
%sh
ls /dbfs
It returns nothing though I can see in the UI the dbfs browser with my mounted buckets and files
Thanks for the help
It's documented in the list of features not released yet:
DBFS access to local file system (FUSE mount). For DBFS access, the Databricks dbutils commands, Hadoop Filesystem APIs such as the %fs command, and Spark read and write APIs are available. Contact your Databricks representative for any questions.
So you'll need to copy file to local disk before reading with Pandas:
dbutils.fs.cp("/mnt/my_bucket/my_file.csv", "file:/tmp/my_file.csv")
df = pd.read_csv('/tmp/my_file.csv')