I am computing some data for every year that is relatively computationally intensive. I have used numba (to great effect) to reduce the time taken to run iterations to compute the data. However given I have 20 years of independent data, I would like to split them into 5 x groups of 4 that could run over 4 different cpu cores.
def compute_matrices(self):
for year in self.years:
self.xs[year].compute_matrix()
In the above code-snippet, the function is a method within a Class that contains attributes year and xs. year
is simply an integer year, and xs
is a cross-section object that houses the xs.data and the compute_matrix() method.
What is the easiest way to split this across multiple cores?
It would be great if there were a Numba style decorater that could automatically break up the loops and run them over different processes and glue the results together. Does this exist?
Is my best bet using Python.multiprocessing?
So there are a couple of things you could look at for this:
NumbaPro: https://store.continuum.io/cshop/accelerate/. This is basically Numba on steroids, providing support for many- and multicore architectures. Unfortunately it is not cheap.
Numexpr: https://code.google.com/p/numexpr/ . This is an expression evaluator for numpy arrays that implements hyperthreading.
Numexpr-Numba (experimental): https://github.com/gdementen/numexpr-numba . As the name suggests this is Numexpr using a Numba backend.
A lot of the answer will depend on what is done in your compute_matrix
method.
The fastest (in terms of development time) solution would probably be to just split your computations using the multiprocessing
library. It should be noted that it will be easier to use multiprocessing if your compute_matrix
function has no side effects.