Search code examples
pythonnumpyvectordot-productnumpy-einsum

Computation of variable interaction (dot product of vectors in a matrix)


If I multiply a vector x (1,n) with itself tansposed, i.e. np.dot(x.T, x) I will get a matrix in quadratic form.

If I have a matrix Xmat (k, n), how can I efficiently compute rowwise dot product and select only upper triangular elements?

So, atm. I have the following solution:

def compute_interaction(x):
    xx = np.reshape(x, (1, x.size))
    return np.concatenate((x, np.dot(xx.T, xx)[np.triu_indices(xx.size)]))

Then compute_interaction(np.asarray([2,5])) yield array([ 2, 5, 4, 10, 25]).

And when I have a matrix I use

np.apply_along_axis(compute_interaction, axis=1, arr = np.asarray([[2,5], [3,4], [8,9]]))

which yields what I want:

array([[ 2,  5,  4, 10, 25],
       [ 3,  4,  9, 12, 16],
       [ 8,  9, 64, 72, 81]])

Is there any other way than to compute this using apply_along_axis? Maybe using np.einsum?


Solution

  • Approach #1

    One solution with np.triu_indices would be -

    r,c = np.triu_indices(arr.shape[1])
    out = np.concatenate((arr,arr[:,r]*arr[:,c]),axis=1)
    

    Approach #2

    Faster one with slicing -

    def pairwise_col_mult(a):
        n = a.shape[1]
        N = n*(n+1)//2
        idx = n + np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
        start, stop = idx[:-1], idx[1:]
        out = np.empty((a.shape[0],n+N),dtype=a.dtype)
        out[:,:n] = a
        for j,i in enumerate(range(n)):
            out[:,start[j]:stop[j]] = a[:,[i]] * a[:,i:]
        return out
    

    Timings -

    In [254]: arr = np.random.randint(0,9,(10000,100))
    
    In [255]: %%timeit
         ...: r,c = np.triu_indices(arr.shape[1])
         ...: out = np.concatenate((arr,arr[:,r]*arr[:,c]),axis=1)
    1 loop, best of 3: 577 ms per loop
    
    In [256]: %timeit pairwise_col_mult(arr)
    1 loop, best of 3: 233 ms per loop