How can I upload a folder recursively to azure blob storage? I would want to upload a parquet file:
abcd.parquet
├── _SUCCESS
├── myPart=20180101
│ └── part-00179-660f71d6-ed44-41c7-acf0-008724dd923a.c000.gz.parquet
├── myPart=20180102
└── part-00022-660f71d6-ed44-41c7-acf0-008724dd923a.c000.gz.parquet
The following:
az storage blob upload -f abcd.parquet -c my_container -n abcd
fails with: Is a directory
It looks like recursive upload is available on windows using AZCopy https://stephanefrechette.com/upload-multiple-files-recursively-azure-blob-storage-azure-cli-2-0-macoslinux/#.W3JpGVJCSL4 https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy
It looks like: something similar is available for linux https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux but I also wonder if I should use spark instead.
Also, is it possible to convert the directory hierarchy on upload into the filename i.e. abcd.parquet_dt=2018..._part-....gz.parquet
so that less directory listings are required?
In the end the partitioning should still work as intended for spark after upload to azure.
related to: - Uploading 10,000,000 files to Azure blob storage from Linux
blobxfer https://github.com/Azure/blobxfer is great to sync the files to azure (recursively)