I am using the az storage fs file download
to download the contents for a parquet
directory like this:
az storage fs file download
--path myname/1/batch-repo/form/Fulfillment/2022/01/02/batch-form-Fulfillment.parquet/
--account-name my-storage-account --f my-container
The download was attempted but apparently the az cli
is not aware this is a parquet and can not handle it - either at the directory level or individual files:
ValueError: This pipeline didn't have the RawDeserializer policy; can't deserialize
Is there any workaround to download the contents of a parquet
file?
After reproducing from my end, I received same error while downloading a directory using the same script as yours.
I can see that the individual files are getting downloaded with the below script.
az storage fs file download -f container --path dir1/part-00004-a9e77425-5fb4-456f-ba52-f821123bd193-c000.snappy.parquet --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"
However, if you are trying to download at directory level you must use az storage fs directory download
. Below is the complete script that worked for me.
az storage fs directory download -f container -d folder1 -s dir1 --account-name adls76224157 --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"
Below is the structure of my files