Compress CSV to ZIP in dbfs (databricks file storage)

I'm trying to compress a csv, located in an azure datalake, to zip. The operation is done with python code in databricks, where I created a mount point to relate directly dbfs with the datalake.

This is my code:

import os
import zipfile 

csv_path= '/dbfs/mnt/<path>.csv'
zip_path= '/dbfs/mnt/<path>.zip' 

with zipfile.ZipFile(zip_path, 'w') as zip:
    zip.write(csv_path)  # zipping the file

But I'm getting this error:

OSError: [Errno 95] Operation not supported

Is there any way of doing it?

Thank you in advance.

Solution

No, this is not possible to do like you did. The main reason is that local DBFS API has limitations - it doesn't support random writes that is required when you're creating a zip file.

The workaround would be following - output zip file to the local disk of the driver node, and then use dbutils.fs.mv to move file to DBFS, something like this:

import os
import zipfile 

csv_path= '/dbfs/mnt/<path>.csv'
zip_path= '/dbfs/mnt/<path>.zip' 
local_path = '/tmp/my_file.zip'

with zipfile.ZipFile(local_path, 'w') as zip:
    zip.write(csv_path)  # zipping the file
dbutils.fs.mv(f"file:{local_path}", zip_path)