Search code examples
pythonarraysnumpycovariancegaussian

Forming a Co-variance matrix for a 2D numpy array


I am trying to figure out a fully vectorised way to compute the co-variance matrix for a 2D numpy array for a given base kernel function. For example if the input is X = [[a,b],[c,d]] for a kernel function k(x_1,x_2) the covariance matrix will be

K=[[k(a,a),k(a,b),k(a,c),k(a,d)], [k(b,a),k(b,b),k(b,c),k(b,d)], [k(c,a),k(c,b),k(c,c),k(c,d)], [k(d,a),k(d,b),k(d,c),k(d,d)]].

how do I go about doing this? I am confused as to how to repeat the values and then apply the function and what might be the most efficient way of doing this.


Solution

  • You can use np.meshgrid to get two matrices with values for the first and second parameter to the k function.

    In [8]: X = np.arange(4).reshape(2,2)    
    In [9]: np.meshgrid(X, X)
    Out[9]: 
    [array([[0, 1, 2, 3],
            [0, 1, 2, 3],
            [0, 1, 2, 3],
            [0, 1, 2, 3]]), 
     array([[0, 0, 0, 0],
            [1, 1, 1, 1],
            [2, 2, 2, 2],
            [3, 3, 3, 3]])]
    

    You can then just pass these matrices to the k function:

    In [10]: k = lambda x1, x2: (x1-x2)**2
    
    In [11]: X1, X2 = np.meshgrid(X, X)
    
    In [12]: k(X1, X2)
    Out[12]: 
    array([[0, 1, 4, 9],
           [1, 0, 1, 4],
           [4, 1, 0, 1],
           [9, 4, 1, 0]])