python parallel-processing distributed-computing

Parallelization of Python code on different machines on different networks

I’m looking to use parallelized code across two computers on different networks to execute a batch of tasks, but am not sure how to do so in Python.

Suppose I have two computers, Computer A and Computer B on two different networks, and I have a batch of 100 tasks to be accomplished. Naively, I could assign Computer A and Computer B to each do 50 tasks, but if Computer A finishes its tasks before Computer B, I would like Computer A to take on some of Computer B’s remaining tasks. Both computers should return the results of their tasks to my local machine. How can this be done?

Solution

You need to create a distributed queue which can work across different networks. Something like rabbit-mq
Put all yours tasks in the queue.
Create a central worker management tool that let's you create and manage workers on Computer A and Computer B. Workers will process your tasks.
You also need to take care availability of workers to achieve what you said - if Computer A finishes its tasks before Computer B, I would like Computer A to take on some of Computer B’s remaining tasks

Luckily, python has an excellent library "Celery" which let's you achieve exactly what you want. It's a well documented library and has a large and diverse community of users and contributors. You just need to setup a broker (or queue) and configure celery.

There are lots of features in Celery that you can use as per your requirement - Monitoring/Scheduling jobs/Celery canvas to name a few.

https://docs.celeryproject.org/en/stable/getting-started/introduction.html https://medium.com/swlh/python-developers-celery-is-a-must-learn-technology-heres-how-to-get-started-578f5d63fab3