python theano convolution sliding-window

How to put an "arbitrary" operation into a sliding window using Theano?

I want to define some function on a matrix X. For example mean(pow(X - X0, 2)), where X0 is another matrix (X0 is fixed / constant). To make it more specific, let's assume that both X and X0 are 10 x 10 matrices. The result of the operation is a real number.

Now I have a big matrix (let's say 500 x 500). I want to apply the operation defined above to all 10 x 10 sub-matrices of the "big" matrix. In other words, I want to slide the 10 x 10 window over the "big" matrix. For each location of the window, I should get a real number. So, as a final result, I need to get a real-valued matrix (or 2D tensor) (its shape should be 491 x 491).

What I want to have is close to a convolutional layer but not exactly the same because I want to use a mean squared deviation instead of a linear function represented by a neuron.

Solution

This is only a Numpy solution, hope it suffices. I assume that your function is made up of an operation on the matrix elements and of a mean, i.e. a scaled sum. Hence, it is sufficient to look at Y as in

Y = np.power(X-X0, 2)

So we only need to deal with determining a windowed mean. Note that for the 1D case the matrix product with an appropriate vector of ones can be determined for calculating the mean, e.g.

h = np.array([0, 1, 1, 0])  # same dimension as y
m1 = np.dot(h, y) / 2 
m2 = (y[1] + y[2]) / 2
print(m1 == m2)  # True

The 2D case is analogous, but with two matrix multiplications, one for the rows and one for the columns. E.g.

m_2 = np.dot(np.dot(h, Y), h) / 2**2

To construct a sliding window, we need to build a matrix of shifted windows, e.g.

H = [[1, 1, 1, 0, 0, ..., 0],
     [0, 1, 1, 1, 0, ..., 0],
            .
            .
            .
     [0, ..., 0, 0, 1, 1, 1]]

to calculate all the sums

S = np.dot(np.dot(H, Y), H.T)

A full example for a (n, n) matrix with a (m, m) window would be

import numpy as np

n, m = 500, 10
X0 = np.ones((n, n))
X = np.random.rand(n, n)
Y = np.power(X-X0, 2)

h = np.concatenate((np.ones(m), np.zeros(n-m)))  # window at position 0
H = np.vstack((np.roll(h, k) for k in range(n+1-m)))  # slide the window 
M = np.dot(np.dot(H,Y), H.T) / m**2  # calculate the mean
print(M.shape)  # (491, 491)

An alternative but probably slightly less efficient way for building H is

H = np.sum(np.diag(np.ones(n-k), k)[:-m+1, :] for k in range(m))

Update

Calculating the mean squared deviation is also possible with that approach. For that, we generalize the vector identity |x-x0|^2 = (x-x0).T (x-x0) = x.T x - 2 x0.T x + x0.T x0 (a space denotes a scalar or matrix multiplication and .T a transposed vector) to the matrix case:

We assume W is a (m,n) matrix containing a block (m.m) identity matrix, which is able to extract the (k0,k1)-th (m,m) sub-matrix by Y = W Z W.T, where Z is the (n,n) matrix containing the data. Calculating the difference

 D = Y - X0 = Y = W Z W.T - X0

is straightforward, where X0 and D is a (m,m) matrix. The square-root of the squared sum of the elements is called Frobenius norm. Based on those identities, we can write the squared sum as

s = sum_{i,j} D_{i,j}^2 = trace(D.T D) = trace((W Z W.T - X0).T (H Z H.T - X0))
  = trace(W Z.T W.T W Z W.T) - 2 trace(X0.T W Z W.T) + trace(X0.T X0)
  =: Y0 + Y1 + Y2

The term Y0 can be interpreted as H Z H.T from the method from above.The term Y1 can be interpreted as a weighted mean on Z and Y2 is a constant, which only needs to be determined once. Thus, a possible implementation would be:

import numpy as np

n, m = 500, 10
x0 = np.ones(m)
Z = np.random.rand(n, n)

Y0 = Z**2
h0 = np.concatenate((np.ones(m), np.zeros(n-m)))
H0 = np.vstack((np.roll(h0, k) for k in range(n+1-m)))
M0 = np.dot(np.dot(H0, Y0), H0.T)

h1 = np.concatenate((-2*x0, np.zeros(n-m)))
H1 = np.vstack((np.roll(h1, k) for k in range(n+1-m)))
M1 = np.dot(np.dot(H1, Z), H0.T)

Y2 = np.dot(x0, x0)
M = (M0 + M1) / m**2 + Y2