Numpy: Multiprocessing a matrix multiplication with pool

I am trying to calculate a dot product with pool

pool = Pool(8)
x = np.array([2,3,1,0])
y = np.array([1,3,1,0])
print np.dot(x,y) #works
print pool.map(np.dot,x,y) #error below

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

also tried

ne.evaluate('dot(x, y)') 


TypeError: 'VariableNode' object is not callable

Solution

What you are trying to do is, unfortunately, not possible in the ways you're trying to do it, and is not possible in a simple way either.

To make things worse, the multiprocessing.pool documentation for Python 2.7 is utterly wrong and essentially lies for Pool.map: it isn't at all equivalent to the builtin map. The builtin map can take multiple argument iterators to pass to the function, while Pool.map can't... this has been known and not fixed or documented in the docstring for Pool.map since at least 2011. There's a partial fix, of course, in Python 3 with starmap...

Honestly, though, the multiprocessing module isn't terribly useful for speeding up numerical code. For example, see here for a long discussion of situations where many numpy operations were slower when done through multiprocessing.

However, there's another issue here as well: you can't simply parallelize like this. map takes lists/iterators of arguments and applies a function to each in turn. That's not going to do what you want: in this case, try map(np.dot,x,y), and note that what you get is simply the product of each element of x and y as a list, not the dot product. Running a function many times in parallel is easy. Making that function parallel on a single call is hard, as it requires making the function itself parallel. In this case, that would usually mean rewriting the function.

Except np.dot is actually already parallelized, if you have a version of numpy with blas or atlas (try np.__config__.show()). You don't actually need to do any work at all in that case: np.dot(x,y) should already use all your cores without any work!

I should note that this is, however, restricted to some dtypes; floats are generally the most supported. On my computer, for example, behold the striking differences between float and int:

In [19]: a = np.matrix(np.random.randint(0,10,size=(1000,1000)),dtype='int')

In [20]: b = a.astype('float')

In [23]: %timeit np.dot(a,a)
1 loops, best of 3: 6.91 s per loop

In [24]: %timeit np.dot(b,b)
10 loops, best of 3: 28.1 ms per loop

For numexpr (and in asking questions, it's useful to point out what abbreviations you are using to those who might not know), there's only a very limited set of supported functions; check the documentation for a list. The error you get is because dot isn't a supported function. Since you're dealing with 1D arrays, and dot is pretty simple to define, the following will work: ne.evaluate('sum(x*y)'). I doubt, however, that you're planning on only using 1D arrays.

If you really want to parallelize things on a large scale, I'd suggest using IPython, and its parallel system, which unlike Python's multiprocessing, is actually useful for numerical work. As an added bonus, it can also parallelize across computers. However, this sort of parallelization is usually only useful for things that take a while per run; if you just want to use all your cores for simple things, then it's probably best to hope that numpy has multiprocessor support for the functions you want to use.