I am trying to refine a very large JSON Data Set. In order to do that, I split the file into many subparts (with the Unix split
command), and assign each part to a process so that it can be fetched and refined independetly.
Each process has its input file, which corresponds to a subset of the main dataset.
Here is how my code looks like:
import multiprocessing as mp
def my_target(input_file, output_file):
...
some code
...
# Is it possible to end the process here ?
#end of the function
worker_count = mp.cpu_count()
processes = [mp.Process(target = my_target, args=(input_file, output_file)) for _ in range(worker_count)]
for p in processes:
p.start()
It is very likely that the processes won't terminate at the same time and hence here is my question: Is it possible to terminate a process when it reaches the last line of the target_function my_target()
?
I suppose that letting processes idle after they're finished with their tasks can slow the evolution of other processes no ?
Any recommendations ?
I guess, that you should check this question, as related to what you might need: how to to terminate process using python's multiprocessing. Because you have to take care about the "zombie process", because if the process is ended and not joined - it will become idle.