Search code examples
pythonmultithreadingzip

Running shutil.make_archive in multithreaded files randomly missing


Writes a script that compresses each subdirectory in the path specified by the command argument and deletes the original subdirectories. For single-threaded cases it works fine, but when I run it in multi-threaded using concurrent.futures.ThreadPoolExecutor() (auto_zip()) (auto_zip_multi_thread()) some files may not be included in the compressed zip. The percentage of missing files is different each time, and running it multiple times in the same directory will give you different results each time. The process can be completed without any flaws. I can't reproduce it like this, but what can I do to fix it? A I don't know the difference because a similar script that unzips the zip and removes the original zip works fine.

import shutil
from concurrent import futures
def task(paths):
    while True:
        try:
            path = next(paths)
            shutil.make_archive(path, 'zip', path)
        except StopIteration:
            break

with futures.ThreadPoolExecutor(max_workers=4) as ex:
    for i in range(4):
        ex.submit(task, paths)

Solution

  • make_archive()Is not thread-safe. Instead, it is solved by calling the command subprocess.run()within . cd zip

    import subprocess
    
    def task(paths):
        while True:
            try:
                path = next(paths)
                cmd = f"zip -r {path}.zip {path}"
                subprocess.run(cmd)
            except StopIteration:
                break