python multithreading python-2.7 subprocess threadpool

Threading: Complete N tasks in period T

Objective: I have n tasks to complete in a period t (say 180 seconds) where each task takes ~ 2 seconds to complete. The tasks must be uniformly distributed across the period t.

Setup: I am using a Mininet environment which uses lightweight virtualization to run Linux hosts. In my setup, I have 10 hosts. Each task is associated with a host (randomly chosen).

What I have done till now is, I have distributed the tasks among the 10 hosts (according to their association) and scheduled them in the background with a random_sleep (less than t) without waiting for them to complete.

$ sleep random_duration && do_task &

However, when n is large (> 2000) this leads to exceeding the maximum number of user processes.

Alternatively, I thought of creating a thread for each host and scheduling tasks in the foreground one by one (begin the next one after the previous one finishes). However, in this approach, I cannot guarantee that n tasks can be completed in t.

What would be a scalable approach to tackle this problem?

Solution

Assuming that you can somehow track the quantity of running tasks, then I see a solution. Access or hard-code the MAX_PROCESSES limit. Have all of your processes into a queue (or any collection). If n < MAX_PROCESSES (with a few left over for normal system work), then you already know how to handle the work.

Where n is too large, then still compute the scheduling interval time_allowed * processor_count / n, but schedule only the first MAX_PROCESSES * ratio tasks, for some ratio < 1.0. Apportion them equally among the processors. As processes complete (perhaps when half of the scheduled ones finish), then schedule another batch of tasks, always keeping the quantity of active tasks somewhat more than processor_count (10, in your example), but comfortably less than MAX_PROCESSES.

Does that handle your problem?