I'm trying to compress a csv, located in an azure datalake, to zip. The operation is done with python code in databricks, where I created a mount point to relate directly dbfs with the datalake.
This is my code:
import os
import zipfile
csv_path= '/dbfs/mnt/<path>.csv'
zip_path= '/dbfs/mnt/<path>.zip'
with zipfile.ZipFile(zip_path, 'w') as zip:
zip.write(csv_path) # zipping the file
But I'm getting this error:
OSError: [Errno 95] Operation not supported
Is there any way of doing it?
Thank you in advance.
No, this is not possible to do like you did. The main reason is that local DBFS API has limitations - it doesn't support random writes that is required when you're creating a zip file.
The workaround would be following - output zip file to the local disk of the driver node, and then use dbutils.fs.mv
to move file to DBFS, something like this:
import os
import zipfile
csv_path= '/dbfs/mnt/<path>.csv'
zip_path= '/dbfs/mnt/<path>.zip'
local_path = '/tmp/my_file.zip'
with zipfile.ZipFile(local_path, 'w') as zip:
zip.write(csv_path) # zipping the file
dbutils.fs.mv(f"file:{local_path}", zip_path)