Search code examples
pythonnumpymultiprocessingnumexpr

Parallelizing a Numpy vector operation


Let's use, for example, numpy.sin()

The following code will return the value of the sine for each value of the array a:

import numpy
a = numpy.arange( 1000000 )
result = numpy.sin( a )

But my machine has 32 cores, so I'd like to make use of them. (The overhead might not be worthwhile for something like numpy.sin() but the function I actually want to use is quite a bit more complicated, and I will be working with a huge amount of data.)

Is this the best (read: smartest or fastest) method:

from multiprocessing import Pool
if __name__ == '__main__':
    pool = Pool()
    result = pool.map( numpy.sin, a )

or is there a better way to do this?


Solution

  • There is a better way: numexpr

    Slightly reworded from their main page:

    It's a multi-threaded VM written in C that analyzes expressions, rewrites them more efficiently, and compiles them on the fly into code that gets near optimal parallel performance for both memory and cpu bounded operations.

    For example, in my 4 core machine, evaluating a sine is just slightly less than 4 times faster than numpy.

    In [1]: import numpy as np
    In [2]: import numexpr as ne
    In [3]: a = np.arange(1000000)
    In [4]: timeit ne.evaluate('sin(a)')
    100 loops, best of 3: 15.6 ms per loop    
    In [5]: timeit np.sin(a)
    10 loops, best of 3: 54 ms per loop
    

    Documentation, including supported functions here. You'll have to check or give us more information to see if your more complicated function can be evaluated by numexpr.