Search code examples
azureparquetazure-cli

How to download a parquet "file" (actually directory) using the Azure Client?


I am using the az storage fs file download to download the contents for a parquet directory like this:

az storage fs file download 
   --path myname/1/batch-repo/form/Fulfillment/2022/01/02/batch-form-Fulfillment.parquet/  
   --account-name my-storage-account --f my-container

The download was attempted but apparently the az cli is not aware this is a parquet and can not handle it - either at the directory level or individual files:

ValueError: This pipeline didn't have the RawDeserializer policy; can't deserialize

Is there any workaround to download the contents of a parquet file?


Solution

  • After reproducing from my end, I received same error while downloading a directory using the same script as yours.

    enter image description here

    I can see that the individual files are getting downloaded with the below script.

    az storage fs file download -f container --path dir1/part-00004-a9e77425-5fb4-456f-ba52-f821123bd193-c000.snappy.parquet --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"
    

    However, if you are trying to download at directory level you must use az storage fs directory download. Below is the complete script that worked for me.

    az storage fs directory download -f container -d folder1 -s dir1 --account-name adls76224157 --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"
    

    Results:

    enter image description here

    Below is the structure of my files

    enter image description here