I want to access the files of type .dcm (dicom) stored in a container on ADLS gen2 in a pyspark notebook on azure synapse analytics. I'm using pydicom to access the files but getting and error that file does not exists. Please have a look at the code below,
To create the filepath I'm using path library:
Path(path_to_dicoms_dir).joinpath('stage_2_train_images/%s.dcm' % pid)
where pid is the id of the dcm image.
To fetch the dcm image I'm using the following way.
d = pydicom.read_file(data['dicom'])
OR
d = pydicom.dcmread(data['dicom'])
where data['dicom'] is the path.
I've checked the path there is no issue with the it, the file exists and all the access rights are there as I'm accessing other files in the directory just above the directory in which these dcm files are there. But the other files are csv and not dcm
Error:
FileNotFoundError: [Errno 2] No such file or directory: 'abfss:/@.dfs.core.windows.net//stage_2_train_images/stage_2_train_images/003d8fa0-6bf1-40ed-b54c-ac657f8495c5.dcm'
Questions that I have in my mind:
Is this the right storage solution for such image data, if not shall I use blog storage then?
The ADLS Gen2 storage account works perfectly fine with Synapse, so there is no need to use blob storage.
It seems like the pydicom
not taking the path correctly.
You need to mount
the ADLS Gen2 account in synapse so that pydicom
will treat the path as an attached hard drive instead if taking URL.
Follow this tutorial given my Microsoft to How to mount Gen2/blob Storage to do the same.
You need to first create a Linked Service
in Synapse which will store your ADLS Gen2 account connection details. Later use below code in notebook to mount the storage account:
mssparkutils.fs.mount(
"abfss://mycontainer@<accountname>.dfs.core.windows.net",
"/test",
{"linkedService":"mygen2account"}
)