Search code examples
pythonmultiple-processesprocess-pool

Python: Multiprocessing with pool.map while main is still working


I am a few days into learning python, and would like to understand this. I am doing a file explorer, and want to speed up thumbnail creation. Watched a bunch of tutorials about multiprocessing, but none show hot to continue main(), while processes are running. I need results in order.

import os
from multiprocessing import Pool
import image


folder='G:\+++CODING+++\\test3\\'


def process_file(filepath):
    return image.convert_to_bytes(filepath, resize=(100, 100))


def create_filepaths(folder):
    filepaths = []
    for file in os.scandir(folder):
        filepaths.append(os.path.join(folder, file))
    return filepaths


def main():
    def process1():
        print('process1 start')
        pool = Pool()
        return pool.map(process_file, create_filepaths(folder), chunksize=1)

    process1()
    while True:
        # do stuff


if __name__ == '__main__':
    main()

I have tried something like following, to combine with multiprocessing, but I cannot call process1 in the MainGuiLoop, and do my mainlogic there of course, since the loop is a thread only. I could do simple GuiUpdates only. This question is only for understanding and learning, my final implementation has to be different.

import os
import time
from multiprocessing import Pool
import image
from time import sleep, perf_counter
import threading


start_time = perf_counter()
folder='G:\+++CODING+++\\test3\\'


def process_file(filepath):
    return image.convert_to_bytes(filepath, resize=(100, 100))


def create_filepaths(folder):
    filepaths = []
    for file in os.scandir(folder):
        filepaths.append(os.path.join(folder, file))
    return filepaths


def main():
    def process1():
        print('process1 start')
        pool = Pool()
        return pool.map(process_file, create_filepaths(folder), chunksize=1)

    def thread_create():
        return threading.Thread(target=MainGuiLoop)

    def MainGuiLoop():
        while True:
            time.sleep(0.5)
            print("-")

    thread1 = thread_create()
    thread1.start()
    result1= process1()
    result2= process1()

if __name__ == '__main__':
    main()

Bonus Question: As I understand .map is not accessible until it is finished. In fact I need it to update the thumbnails from time to time, so I have to use .map_async() Which solves the dead main(), but gives unordered results. But I could create my list with an id, and sort the results in chunks, that I want to draw on screen. Is that the right approach?


Solution

  • The documentation suggests that a Process Pool Executor is what you're looking for. Code below from the link. As for the unordered results, try passing in a sorted list.

    import concurrent.futures
    import math
    
    PRIMES = [
        112272535095293,
        112582705942171,
        112272535095293,
        115280095190773,
        115797848077099,
        1099726899285419]
    
    def is_prime(n):
        if n < 2:
            return False
        if n == 2:
            return True
        if n % 2 == 0:
            return False
    
        sqrt_n = int(math.floor(math.sqrt(n)))
        for i in range(3, sqrt_n + 1, 2):
            if n % i == 0:
                return False
        return True
    
    def main():
        with concurrent.futures.ProcessPoolExecutor() as executor:
            for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
                print('%d is prime: %s' % (number, prime))
    
    if __name__ == '__main__':
        main()
    

    So, for your code:

    import os
    import concurrent.futures
    import image
    
    FOLDER = 'G:\+++CODING+++\\test3\\'
    
    def process_file(filepath):
        return image.convert_to_bytes(filepath, resize=(100, 100))
    
    def create_filepaths(folder):
        return sorted([os.path.join(folder, f) for f in os.scandir(folder)])
    
    def main():
        with concurrent.futures.ProcessPoolExecutor() as executor:
            filepaths = create_filepaths(FOLDER)
            executor.map(process_file, filepaths) 
            while True:
                # do stuff
    
    if __name__ == '__main__':
        main()