Search code examples
pythonnumpyscipyarray-broadcasting

Row-wise Broadcast of arbitrary function in numpy


I have a matrix of vectors where each row is a vector. I want to take the mean of all the vectors, then calculate the cosine distance between each vector and this mean, returning an array of distances.

>>> x = arange(1,10).reshape(3,3)
array([[1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]])
>>> m = x.mean(0)
array([4., 5., 6.])

The cosine values are as follows

>>> from scipy.spatial.distance import cosine
cosine([1,2,3], [4,5,6])
0.0253681538029239
>>> cosine([4,5,6], [4,5,6])
0.0
>>> cosine([7,8,9], [4,5,6])
0.001809107314273195

Therefore I want to write a function f such that

>>> f(x, m)
array([0.0253681538029239, 0.0, 0.001809107314273195])

(Or the transpose of such an array. It doesn't matter.)

What is the most efficient, most numpythonic way to write f? It seems like the trick is to get the proper broadcast over the cosine function, but I haven't figured out how to do this. The following doesn't work.

>>> from numpy import frompyfunc
>>> f = frompyfunc(cosine, 2, 1)
>>> f(x, m)
array([[0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0]], dtype=object)

(It looks like here numpy is applying cosine element-wise instead of row-wise.)

Is there a way to do this without writing a for-loop?


It looks like this is possible with apply_along_axis.

>>> from numpy import apply_along_axis
>>> from functools import partial
>>> g = partial(cosine, m)
>>> apply_along_axis(g, 1, x)
array([0.02536815, 0.        , 0.00180911])

Is this the most efficient way?


Solution

  • You need to reshape your mean array to be 2D.

    >>> from scipy.spatial.distance import cdist
    >>> cdist(x, m.reshape(1, -1), metric='cosine')
    array([[2.53681538e-02],
       [2.22044605e-16],
       [1.80910731e-03]])