python multithreading multiprocessing os.system simultaneous

Run multiple .py from another .py simultaneously with arguments and timeout

I have a program, let's say "main.py", which runs through the argument "python main.py 3" or, for example, "python main.py 47", which means running a specific ID inside the program itself.

I'm trying to write another script, let's say "start.py", so that it starts a certain number of such programs. If inside start.py I have written threads = 4, timeout = 5, then it should run "python main.py 1", "python main.py 2", "python main.py 3", "python main.py 4" at the same time, but with a delay of 5 seconds between each command.

I know how to do this in one thread, but no other arguments are run until the previous one completes.

threads = 4
id = 1
for i in range(threads):
    os.system(f"python main.py {id}")
    id += 1
    time.sleep(5)

I am trying to do this via multiprocessing, but I am failing. What is the best way to implement this, and am I going in the right direction?

I've already done this through bash, but I only need to do it in Python.

for ((i=1; i<=4; i++))
do
    python3 main.py "$i" &
done

Solution

If you don't want to or can't make changes to main.py, then the simplest change you can make to your current code is to simply execute the system call in a thread so you do not block:

from threading import Thread
import os
import time

def run_main(id):
    os.system(f"python main.py {id}")

threads = 4
id = 1
started_threads = []
for i in range(threads):
    if i != 0:
        time.sleep(5)
    t = Thread(target=run_main, args=(id,))
    t.start()
    started_threads.append(t)
    id += 1
for t in started_threads:
    t.join()

Note that I have moved the call to time.sleep since you were doing an extra call that you did not need.

But this is rather expensive in that you are starting a Python interpreter for each invocation of main. If I understand the comment offered by @BoarGules (although what he literally said would not run the function main 4 times in parallel but rather sequentially), the following is an alternative implementation if main.py is structured like the following:

import sys

def main(id):
    ... # process

if __name__ == '__main__':
    main(sys.argv[1])

And then your start.py, if running under Linux or some platform that uses fork to start new processes, is coded as follows:

from multiprocessing import Process
import os
import time
import main

threads = 4
id = 1
started_processes = []
for i in range(threads):
    if i != 0:
        time.sleep(5)
    p = Process(target=main.main, args=(id,))
    p.start()
    started_processes.append(p)
    id += 1
for p in started_processes:
    p.join()

But if you are running under Windows or some platform that uses spawn to start new processes, then you must code start.py as follows:

from multiprocessing import Process
import os
import time
import main

# required for Windows:
if __name__ == '__main__':
    threads = 4
    id = 1
    started_processes = []
    for i in range(threads):
        if i != 0:
            time.sleep(5)
        p = Process(target=main.main, args=(id,))
        p.start()
        started_processes.append(p)
        id += 1
    for p in started_processes:
        p.join()

And each new Process instance you create will end up running a new Python interpreter anyway, so you will not be saving much over the initial solution I offered.

This is why when you post a question tagged with multiprocessing you are supposed to also tag the question with the platform.