In my implementation of an svm prediction model, I would like to make the execution of a function svmutil.svm_train
multi-threaded. Although I'm new to the implementation of multi-threaded programs, I have some knowledge of parallel programming concepts and I believe training multiple models with different sets of parameters simultaneously is theoretically possible.
Setup:
import svmutil
import multiprocessing as mp
problem = svm_util.svm_read_problem('my_problem')
# I have a list of svm_param objects I want to train
params = myCode.svm_param_list()
# Calculate the number of worker threads
processes = mp.cpu_count() * 2
Split the training into multiple threads of execution:
pool = mp.Pool(processes)
for param in params:
pool.apply(svmutil.svm_train, args=(problem, param,))
pool.close()
pool.join()
However, the problem I'm having is that svmutil.svm_train
cannot be pickled as it contains a c-type pointer. The python interpreter gives me the error:
ValueError: ctypes objects containing pointers cannot be pickled
I'd rather adjust my implementation than somehow pickle the function in the module. Therefore, I would like to know, is there some way in which I can parallelise this function without pickling it?
Also, how can I gather the results of the function? Ideally, this would be a list of trained models (output of the function svmutil.svm_train
for each time I called the function).
I managed to use the top answer, in the question linked to in the comment above, to build a solution. I was able to parallelise the call to the python function in an external module by spawning pipes and processes and listening for results. The function I pass into the pipe is as follows: pipe.send(svmutil.svm_train(problem,param)))