Search code examples
pythontheanoconvolutionsliding-window

How to put an "arbitrary" operation into a sliding window using Theano?


I want to define some function on a matrix X. For example mean(pow(X - X0, 2)), where X0 is another matrix (X0 is fixed / constant). To make it more specific, let's assume that both X and X0 are 10 x 10 matrices. The result of the operation is a real number.

Now I have a big matrix (let's say 500 x 500). I want to apply the operation defined above to all 10 x 10 sub-matrices of the "big" matrix. In other words, I want to slide the 10 x 10 window over the "big" matrix. For each location of the window, I should get a real number. So, as a final result, I need to get a real-valued matrix (or 2D tensor) (its shape should be 491 x 491).

What I want to have is close to a convolutional layer but not exactly the same because I want to use a mean squared deviation instead of a linear function represented by a neuron.


Solution

  • This is only a Numpy solution, hope it suffices. I assume that your function is made up of an operation on the matrix elements and of a mean, i.e. a scaled sum. Hence, it is sufficient to look at Y as in

    Y = np.power(X-X0, 2)
    

    So we only need to deal with determining a windowed mean. Note that for the 1D case the matrix product with an appropriate vector of ones can be determined for calculating the mean, e.g.

    h = np.array([0, 1, 1, 0])  # same dimension as y
    m1 = np.dot(h, y) / 2 
    m2 = (y[1] + y[2]) / 2
    print(m1 == m2)  # True
    

    The 2D case is analogous, but with two matrix multiplications, one for the rows and one for the columns. E.g.

    m_2 = np.dot(np.dot(h, Y), h) / 2**2
    

    To construct a sliding window, we need to build a matrix of shifted windows, e.g.

    H = [[1, 1, 1, 0, 0, ..., 0],
         [0, 1, 1, 1, 0, ..., 0],
                .
                .
                .
         [0, ..., 0, 0, 1, 1, 1]] 
    

    to calculate all the sums

    S = np.dot(np.dot(H, Y), H.T)
    

    A full example for a (n, n) matrix with a (m, m) window would be

    import numpy as np
    
    n, m = 500, 10
    X0 = np.ones((n, n))
    X = np.random.rand(n, n)
    Y = np.power(X-X0, 2)
    
    h = np.concatenate((np.ones(m), np.zeros(n-m)))  # window at position 0
    H = np.vstack((np.roll(h, k) for k in range(n+1-m)))  # slide the window 
    M = np.dot(np.dot(H,Y), H.T) / m**2  # calculate the mean
    print(M.shape)  # (491, 491)
    

    An alternative but probably slightly less efficient way for building H is

    H = np.sum(np.diag(np.ones(n-k), k)[:-m+1, :] for k in range(m))
    

    Update

    Calculating the mean squared deviation is also possible with that approach. For that, we generalize the vector identity |x-x0|^2 = (x-x0).T (x-x0) = x.T x - 2 x0.T x + x0.T x0 (a space denotes a scalar or matrix multiplication and .T a transposed vector) to the matrix case:

    We assume W is a (m,n) matrix containing a block (m.m) identity matrix, which is able to extract the (k0,k1)-th (m,m) sub-matrix by Y = W Z W.T, where Z is the (n,n) matrix containing the data. Calculating the difference

     D = Y - X0 = Y = W Z W.T - X0 
    

    is straightforward, where X0 and D is a (m,m) matrix. The square-root of the squared sum of the elements is called Frobenius norm. Based on those identities, we can write the squared sum as

    s = sum_{i,j} D_{i,j}^2 = trace(D.T D) = trace((W Z W.T - X0).T (H Z H.T - X0))
      = trace(W Z.T W.T W Z W.T) - 2 trace(X0.T W Z W.T) + trace(X0.T X0)
      =: Y0 + Y1 + Y2
    

    The term Y0 can be interpreted as H Z H.T from the method from above.The term Y1 can be interpreted as a weighted mean on Z and Y2 is a constant, which only needs to be determined once. Thus, a possible implementation would be:

    import numpy as np
    
    n, m = 500, 10
    x0 = np.ones(m)
    Z = np.random.rand(n, n)
    
    Y0 = Z**2
    h0 = np.concatenate((np.ones(m), np.zeros(n-m)))
    H0 = np.vstack((np.roll(h0, k) for k in range(n+1-m)))
    M0 = np.dot(np.dot(H0, Y0), H0.T)
    
    h1 = np.concatenate((-2*x0, np.zeros(n-m)))
    H1 = np.vstack((np.roll(h1, k) for k in range(n+1-m)))
    M1 = np.dot(np.dot(H1, Z), H0.T)
    
    Y2 = np.dot(x0, x0)
    M = (M0 + M1) / m**2 + Y2