I have a large zip file that contains 1 file inside. I want to unzip that file to a given directory for further processing and used this code:
def unzip(zipfile: ZipFile, filename: str, dest: str):
ZipFile.extract(zipfile, filename, dest)
This function is called using:
with ZipFile(file_path, "r") as zip_source:
unzip(zip_source, zip_source.infolist()[0], extract_path) # extract path is correctly defined earlier in the code
It seems like unzipping a large file takes a long time (file size > 500 Mb) and I would like to optimize this solution.
All the optimizations I found were multiprocessing based in order to make the extraction of multiple files faster, however, my zip contains only a single file so multiprocessing doesn't seem to be the answer.
You cannot parallelize the decompression of a zip file with 1 file inside as long are the file is actually compressed using the usual decompression algorithms LZ77/LZW/LZSS. These algorithm are intrinsically sequential.
Moreover these decompression methods are known to be slow (often much slower than reading the file from a storage device). This is mainly because of the algorithm themselves: their complexity and the fact that most mainstream processors cannot speed the computation up by a large margin.
Thus, there is no way to decompress the file faster, although you might find a slightly faster implementation by using another library.