I am a beginner in Python, so I would very appreciate it if you can help me with clear and easy explanations.
In my Python script, I have a function that makes several threads to do an I/O bound task (What it really does is making several Azure requests concurrently using Azure Python SDK), and I also have a list of time differences like [1 second, 3 seconds, 10 seconds, 5 seconds, ..., 7 seconds] so that I execute the function again after each time difference.
Let's say I want to execute the function and execute it again after 5 seconds. The first execution can take much more than 5 seconds to finish as it has to wait for the requests it makes to be done. So, I want to execute each function in a different process so that different executions of the function do not block each other (Even if they don't block each other without using different processes, I just didn't want threads in different executions to be mixed).
My code is like:
import multiprocessing as mp
from time import sleep
def function(num_threads):
"""
This functions makes num_threads number of threads to make num_threads number of requests
"""
# Time to wait in seconds between each execution of the function
times = [1, 10, 7, 3, 13, 19]
# List of number of requests to make for each execution of the function
num_threads_list = [1, 2, 3, 4, 5, 6]
processes = []
for i in range(len(times)):
p = mp.Process(target=function, args=[num_threads_list[i]])
p.start()
processes.append(p)
sleep(times[i])
for process in processes:
process.join()
Question I have due to mare:
the length of the list "times" is very big in my real script (, which is 1000). Considering the time differences in the list "times", I guess there are at most 5 executions of the function running concurrently using processes. I wonder if each process terminates when it is done executing the function, so that there are actually at most 5 processes running. Or, Does it remain so that there will be 1000 processes, which sounds very weird given the number of CPU cores of my computer?
Please tell me if you think there is a better way to do what I explained above.
Thank you!
The main problem I destilate from your question is having a large amount of processes running simultaniously.
You can prevent that by maintaining a list of processes with a maximum length. Something like this.
import multiprocessing as mp
from time import sleep
from random import randint
def function(num_threads):
"""
This functions makes num_threads number of threads to make num_threads number of requests
"""
sleep(randint(3, 7))
# Time to wait in seconds between each execution of the function
times = [1, 10, 7, 3, 13, 19]
# List of number of requests to make for each execution of the function
num_threads_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
process_data_list = []
max_processes = 4
# =======================================================================================
def main():
times_index = 0
while times_index < len(times):
# cleanup stopped processes -------------------------------
cleanup_done = False
while not cleanup_done:
cleanup_done = True
# search stopped processes
for i, process_data in enumerate(process_data_list):
if not process_data[1].is_alive():
print(f'process {process_data[0]} finished')
# remove from processes
p = process_data_list.pop(i)
del p
# start new search
cleanup_done = False
break
# try start new process ---------------------------------
if len(process_data_list) < max_processes:
process = mp.Process(target=function, args=[num_threads_list[times_index]])
process.start()
process_data_list.append([times_index, process])
print(f'process {times_index} started')
times_index += 1
else:
sleep(0.1)
# wait for all processes to finish --------------------------------
while process_data_list:
for i, process_data in enumerate(process_data_list):
if not process_data[1].is_alive():
print(f'process {process_data[0]} finished')
# remove from processes
p = process_data_list.pop(i)
del p
# start new search
break
print('ALL DONE !!!!!!')
# =======================================================================================
if __name__ == '__main__':
main()
It runs max_processes at once as you can see in the result.
process 0 started
process 1 started
process 2 started
process 3 started
process 3 finished
process 4 started
process 1 finished
process 5 started
process 0 finished
process 2 finished
process 5 finished
process 4 finished
ALL DONE !!!!!!