I want to define some function on a matrix X
. For example mean(pow(X - X0, 2))
, where X0
is another matrix (X0
is fixed / constant). To make it more specific, let's assume that both X
and X0
are 10 x 10
matrices. The result of the operation is a real number.
Now I have a big matrix (let's say 500 x 500
). I want to apply the operation defined above to all 10 x 10
sub-matrices of the "big" matrix. In other words, I want to slide the 10 x 10
window over the "big" matrix. For each location of the window, I should get a real number. So, as a final result, I need to get a real-valued matrix (or 2D tensor) (its shape should be 491 x 491
).
What I want to have is close to a convolutional layer but not exactly the same because I want to use a mean squared deviation instead of a linear function represented by a neuron.
This is only a Numpy solution, hope it suffices.
I assume that your function is made up of an operation on the matrix elements and of a mean, i.e. a scaled sum. Hence, it is sufficient to look at Y
as in
Y = np.power(X-X0, 2)
So we only need to deal with determining a windowed mean. Note that for the 1D case the matrix product with an appropriate vector of ones can be determined for calculating the mean, e.g.
h = np.array([0, 1, 1, 0]) # same dimension as y
m1 = np.dot(h, y) / 2
m2 = (y[1] + y[2]) / 2
print(m1 == m2) # True
The 2D case is analogous, but with two matrix multiplications, one for the rows and one for the columns. E.g.
m_2 = np.dot(np.dot(h, Y), h) / 2**2
To construct a sliding window, we need to build a matrix of shifted windows, e.g.
H = [[1, 1, 1, 0, 0, ..., 0],
[0, 1, 1, 1, 0, ..., 0],
.
.
.
[0, ..., 0, 0, 1, 1, 1]]
to calculate all the sums
S = np.dot(np.dot(H, Y), H.T)
A full example for a (n, n)
matrix with a (m, m)
window would be
import numpy as np
n, m = 500, 10
X0 = np.ones((n, n))
X = np.random.rand(n, n)
Y = np.power(X-X0, 2)
h = np.concatenate((np.ones(m), np.zeros(n-m))) # window at position 0
H = np.vstack((np.roll(h, k) for k in range(n+1-m))) # slide the window
M = np.dot(np.dot(H,Y), H.T) / m**2 # calculate the mean
print(M.shape) # (491, 491)
An alternative but probably slightly less efficient way for building H
is
H = np.sum(np.diag(np.ones(n-k), k)[:-m+1, :] for k in range(m))
Update
Calculating the mean squared deviation is also possible with that approach. For that, we generalize the vector identity |x-x0|^2 = (x-x0).T (x-x0) = x.T x - 2 x0.T x + x0.T x0
(a space denotes a scalar or matrix multiplication and .T
a transposed vector) to the matrix case:
We assume W
is a (m,n)
matrix containing a block (m.m)
identity matrix, which is able to extract the (k0,k1)
-th (m,m)
sub-matrix by Y = W Z W.T
, where Z
is the (n,n)
matrix containing the data. Calculating the difference
D = Y - X0 = Y = W Z W.T - X0
is straightforward, where X0
and D
is a (m,m)
matrix. The square-root of the squared sum of the elements is called Frobenius norm. Based on those identities, we can write the squared sum as
s = sum_{i,j} D_{i,j}^2 = trace(D.T D) = trace((W Z W.T - X0).T (H Z H.T - X0))
= trace(W Z.T W.T W Z W.T) - 2 trace(X0.T W Z W.T) + trace(X0.T X0)
=: Y0 + Y1 + Y2
The term Y0
can be interpreted as H Z H.T
from the method from above.The term Y1
can be interpreted as a weighted mean on Z
and Y2
is a constant, which only needs to be determined once.
Thus, a possible implementation would be:
import numpy as np
n, m = 500, 10
x0 = np.ones(m)
Z = np.random.rand(n, n)
Y0 = Z**2
h0 = np.concatenate((np.ones(m), np.zeros(n-m)))
H0 = np.vstack((np.roll(h0, k) for k in range(n+1-m)))
M0 = np.dot(np.dot(H0, Y0), H0.T)
h1 = np.concatenate((-2*x0, np.zeros(n-m)))
H1 = np.vstack((np.roll(h1, k) for k in range(n+1-m)))
M1 = np.dot(np.dot(H1, Z), H0.T)
Y2 = np.dot(x0, x0)
M = (M0 + M1) / m**2 + Y2