Im trying to make an genetic algorithm runs the candidates parallelly using multiprocessing. So i did a code like this
import multiprocessing as mp
...
parents = []
queue = mp.Queue(maxsize=poolSize - 1)
processes = []
for _ in range(poolSize - 1):
processes.append(mp.Process(target=generate_parent, args=(queue,)))
for process in processes:
process.start()
for process in processes:
process.join()
for _ in range(poolSize - 1):
parents.append(queue.get())
Something gone wrong and I just don't know what. When I tryied debugging the code I saw when it gets to "process.start()" the execution just stops as if it has got to a "while True: continue". The same happens when I try to execute it normally, the code stucks at some point but it doesn't stops the process or raises any error.
I'm newbie for multiprocessing and general parallelism stuff and i would be glad if someone could help me.
The whole code is here: https://github.com/estevaopbs/Molpro_tools
This specific problem is in genetic.py line 144. (I know there are some another problems in the code. I'm solving it and they are not supposed to impact in this specific problem.)
It looks like the problem is here (and if it's not, it's still a trouble spot):
def fn_generate_parent(queue=None):
while True:
try:
parent = Chromosome()
parent.Genes = create_lookup[random.choices(create_methods.methods, create_methods.rate)[0]]\
(first_molecule)
parent.Fitness = get_fitness(parent.Genes, fitness_param, threads_per_calculation)
break
except:
os.remove(f'data/{parent.Genes.__hash__()}.inp')
os.remove(f'data/{parent.Genes.__hash__()}.out')
os.remove(f'data/{parent.Genes.__hash__()}.xml')
continue
If there's an exception in the try
block - an undeclared variable, unknown method, division by 0 - the while True
continues as a silent infinite loop.
The problem is that if something goes wrong, the code doesn't take any corrective action or stop, it just quietly continues and possibly keeps encountering the same error. I would remove the continue
or at least augment it with some error messages, logging, or maybe a counter that only retries a few times.