Search code examples
pythonpython-2.7multiprocessinglibsvmpool

How to parallelise a function from another module using python multithreading without pickling it?


In my implementation of an svm prediction model, I would like to make the execution of a function svmutil.svm_train multi-threaded. Although I'm new to the implementation of multi-threaded programs, I have some knowledge of parallel programming concepts and I believe training multiple models with different sets of parameters simultaneously is theoretically possible.

Setup:

import svmutil
import multiprocessing as mp

problem = svm_util.svm_read_problem('my_problem')
# I have a list of svm_param objects I want to train
params = myCode.svm_param_list()

# Calculate the number of worker threads
processes = mp.cpu_count() * 2

Split the training into multiple threads of execution:

pool = mp.Pool(processes)
for param in params:
    pool.apply(svmutil.svm_train, args=(problem, param,))

pool.close()
pool.join()

However, the problem I'm having is that svmutil.svm_train cannot be pickled as it contains a c-type pointer. The python interpreter gives me the error:

ValueError: ctypes objects containing pointers cannot be pickled

I'd rather adjust my implementation than somehow pickle the function in the module. Therefore, I would like to know, is there some way in which I can parallelise this function without pickling it?

Also, how can I gather the results of the function? Ideally, this would be a list of trained models (output of the function svmutil.svm_train for each time I called the function).


Solution

  • I managed to use the top answer, in the question linked to in the comment above, to build a solution. I was able to parallelise the call to the python function in an external module by spawning pipes and processes and listening for results. The function I pass into the pipe is as follows: pipe.send(svmutil.svm_train(problem,param)))