I'm trying to read a file as a DataFrame in a Synapse Spark notebook using spark.read.format('csv').load('/path/to/file')
. The file is located in an ADLS Gen2, which is mounted on all Spark pools in the Synapse workspace (ADLS mounted with a workspace
scope). The mounted ADLS is accessible using the filesystem APIs like any other file in the cluster filesystem. I double checked, and I can see the file using os.listdir('/path/to/file/directory')
. However, I get the following error when I try to read the file using the Spark API:
Path does not exist: abfss://<container>@<storage account>.dfs.core.windows.net/path/to/file
Given the error message, it looks like spark.read.format().load()
is trying to access the ADLS directly, instead of going to the path where the ADLS is mounted. Is there a way for spark.read.format().load()
to read a file in the filesystem, instead of using the abfss path?
scope.When you already have the mount point to ADLS Gen2, access it using the format mentioned in this document.
To find jobId
, run the code below.
You can also get the path using the command below.
Next, provide this path to load the data.
df = spark.read.load("synfs:/49/test/myFile.csv", format='csv')
This solution also works with ADLS Gen2 mounted with a workspace
scope (like the one in the question question). If that is the case, replace {jobId}
with workspace
in the previous paths. For example:
df = spark.read.load("synfs:/workspace/{container name}/{path to file}", format='csv')