I have two files stored in 'dbfs/tmp/folder/'
I am trying to zip the files, although the code runs with no error but the created .zip file cannot be seen in the folder. What is the best way to zip two files in Databricks?
code:-
file_paths = ['/dbfs/dbfs/tmp/folder1/test1.parquet',
'/dbfs/dbfs/tmp/folder1/test2.parquet']
zip_name = 'myzip.zip'
zip_file = zipfile.ZipFile(zip_name, "w")
for file in file_paths:
zip_file.write(file)
zip_file.close()
Executes with no error but the zipped folder cannot been seen under '/dbfs/dbfs/tmp/folder1/'
By default file will be created on the local disk of the driver node. But you can't put /dbfs/...
as output destination because of the DBFS limitations, described in this answer. What you'll need is:
zip_name = 'myzip.zip'
dbutils.fs.cp
command using file:
as prefix for the local file name:zip_name = '/tmp/myzip.zip'
zip_file = zipfile.ZipFile(zip_name, "w")
for file in file_paths:
zip_file.write(file)
zip_file.close()
# copy file from local disk to DBFS...
dbutils.fs.cp(f"file:{zip_name}", zip_name)