Search code examples
pythonpython-3.xprocesspython-multiprocessingmoviepy

Creating main process for a for loop


This program returns the resolution of the video but since I need for a large scale project I need multiprocessing. I have tried using and parallel processing using a different function but that would just run it multiple times not making it efficent I am posting the entire code. Can you help me create a main process that takes all cores.

import os
from tkinter.filedialog import askdirectory
from moviepy.editor import VideoFileClip


if __name__ == "__main__":
    dire = askdirectory()
    d = dire[:]
    print(dire)
    death = os.listdir(dire)
    print(death)
    for i in death: #multiprocess this loop
        dire = d
        dire += f"/{i}"
        v = VideoFileClip(dire)
        print(f"{i}: {v.size}")

This code works fine but I need help with creating a main process(uses all cores) for the for loop alone. can you excuse the variables names I was angry at multiprocessing. Also if you have tips on making the code efficient I would appreciate it.


Solution

  • You are, I suppose, assuming that every file in the directory is a video clip. I am assuming that processing the video clip is an I/O bound "process" for which threading is appropriate. Here I have rather arbitrarily crated a thread pool size of 20 threads this way:

    MAX_WORKERS = 20 # never more than this
    N_WORKERS = min(MAX_WORKERS, len(death))
    

    You would have to experiment with how large MAX_WORKERS could be before performance degrades. This might be a low number not because your system cannot support lots of threads but because concurrent access to multiple files on your disk that may be spread across the medium may be inefficient.

    import os
    from tkinter.filedialog import askdirectory
    from moviepy.editor import VideoFileClip
    from concurrent.futures import ThreadPoolExecutor as Executor
    from functools import partial
    
    
    def process_video(parent_dir, file):
        v = VideoFileClip(f"{parent_dir}/{file}")
        print(f"{file}: {v.size}")
    
    
    if __name__ == "__main__":
        dire = askdirectory()
        print(dire)
        death = os.listdir(dire)
        print(death)
        worker = partial(process_video, dire)
        MAX_WORKERS = 20 # never more than this
        N_WORKERS = min(MAX_WORKERS, len(death))
        with Executor(max_workers=N_WORKERS) as executor:
            results = executor.map(worker, death) # results is a list: [None, None, ...]
    

    Update

    According to @Reishin, moviepy results in executing the ffmpeg executable and thus ends up creating a process in which the work is being done. So there us no point in also using multiprocessing here.