Search code examples
pythonparallel-processingdistributed-computing

Parallelization of Python code on different machines on different networks


I’m looking to use parallelized code across two computers on different networks to execute a batch of tasks, but am not sure how to do so in Python.

Suppose I have two computers, Computer A and Computer B on two different networks, and I have a batch of 100 tasks to be accomplished. Naively, I could assign Computer A and Computer B to each do 50 tasks, but if Computer A finishes its tasks before Computer B, I would like Computer A to take on some of Computer B’s remaining tasks. Both computers should return the results of their tasks to my local machine. How can this be done?


Solution

    • You need to create a distributed queue which can work across different networks. Something like rabbit-mq
    • Put all yours tasks in the queue.
    • Create a central worker management tool that let's you create and manage workers on Computer A and Computer B. Workers will process your tasks.
    • You also need to take care availability of workers to achieve what you said - if Computer A finishes its tasks before Computer B, I would like Computer A to take on some of Computer B’s remaining tasks

    Luckily, python has an excellent library "Celery" which let's you achieve exactly what you want. It's a well documented library and has a large and diverse community of users and contributors. You just need to setup a broker (or queue) and configure celery.

    There are lots of features in Celery that you can use as per your requirement - Monitoring/Scheduling jobs/Celery canvas to name a few.

    https://docs.celeryproject.org/en/stable/getting-started/introduction.html https://medium.com/swlh/python-developers-celery-is-a-must-learn-technology-heres-how-to-get-started-578f5d63fab3