Search code examples
pythonpython-multiprocessing

Using multiprocessing in python to return values


Background

I have some code that looks like this right now.

failed_player_ids: Set[str] = set()
for player_id in player_ids:
    success = player_api.send_results(
        player_id, user=user, send_health_results=True
    )
    if not success:
        failed_player_ids.add(player_id)

This code works well but the problem is this is taking 5 seconds per call. There is a rate limit of 2000 calls per minute so i am way under the max capacity. I want to parallelize this to speed things up. This is my first time using multiprocessing library in python and hence I am a little confused as to how i should proceed. I can describe what i want to do in words.

In my current code i am loop through list of player_id and if api response is success I do nothing and if it failed i make note of that player id.

I am not sure how to implement paralleled version of this code. I have some idea but i am a little confused.

This is what i though of so far

from multiprocessing import Pool


    
    num_processors_to_use = 5 # This is a number can be increased to get more speed
    
    def send_player_result(player_id_list: List[str]) -> Optional[str]:
        for player_id in player_id_list:
            success = player_api.send_results(player_id, user=user, send_health_results=True)
            if not success:
                return player_id
    # Caller
    with Pool(processes=num_processors_to_use) as pool:
            responses = pool.map(
                func=send_player_result,
                iterable=player_id_list,
            )
            failed_player_ids = Set(responses)

 

Any comments and suggestions would help.


Solution

  • If you are using function map, then each item of the iterable player_id_list will be passed as a separate task to function send_player_result. Consequently, this function should no longer be expecting to be passed a list of player ids, but rather a single player id. And, as you know by now, if your tasks are largely I/O bound, then multithreading is a better model. You can either:

    from multiprocessing.dummy import Pool
    # or
    from multiprocessing.pool import ThreadPool
    

    You will probably want to greatly increase the number of threads (but not greater than the size of player_id_list):

    #from multiprocessing import Pool
    from multiprocessing.dummy import Pool
    from typing import Set
    
    def send_player_result(player_id):
        success = player_api.send_results(player_id, user=user, send_health_results=True)
        return success
    
    # Only required for Windows if you are doing multiprocessing:
    if __name__ == '__main__':
        
        pool_size = 5 # This is a number can be increased to get more concurrency
        
        # Caller
        failed_player_ids: Set[str] = set()
        with Pool(pool_size) as pool:
            results = pool.map(func=send_player_result, iterable=player_id_list)
            for idx, success in enumerate(results):
                if not success:
                    # failed for argument player_id_list[idx]:
                    failed_player_ids.add(player_id_list[idx])