python keras keras-layer batch-normalization

What are ways I can debug this keras layer?

I am new to Keras and am attempting to implement the decorrelated batch Norm paper (https://arxiv.org/abs/1804.08450) in Keras as a learning experience. The layer is very similar to standard batch norm with a few additional components.

Instead of centering the input data to each layer and normalizing by the variance, we now center the data and apply a whitening transform which is computed by doing an eigenvalue decomposition on the covariance matrix.

The entire procedure is clearly laid out in the paper (Algorithm 1, page 5) and composed of only like 5 equations, whose implementation I marked in the code below. I successfully re-implemented standard batch norm layer, but am getting NaN loss and low accuracy when I incorporate the whitening procedure.

I am wondering if there any advice I should follow to debug this code. I am not sure if I made a dimensionality mistake or incorrectly implemented the equations, but any help would be appreciated.

Here is the code if you are interested (Edited to include Daniel Möller's corrections). The input to the layer is a tensor of dimension (batch_size height width channels).

input_shape = K.int_shape(inputs) # (batch_size height width channels) 
# unroll all dimensions except feature maps dim (c X hwb)
pool_shape = (-1, input_shape[-1]) 
x = K.reshape(x,pool_shape)
x = K.permute_dimensions(x, (1,0)) #if you do want to invert the dimensions

mean = K.mean(x,1,keepdims=True)     

# standard batch norm
#stddev = K.std(x,1,keepdims=True) + self.epsilon
#normed = (x - mean) / stddev
#normed = K.reshape(normed,((-1,)+ input_shape[1:]))

# center inputs
centered_inputs = x - mean 

#vvvvvERROR SOMEWHERE IN HEREvvvvv#
# compute covariance matrix for reshaped inputs xxt
covar = K.batch_dot(K.expand_dims(x, axis=-1), K.expand_dims(x, axis=-1),axes=(2,2))
# fuzz covariance matrix to prevent singularity
covar = covar + self.epsilon 

# execute eigenvalue decomposition
#Lambda, D,_ = tf.svd(covar,compute_uv=True)
Lambda, D = tf.self_adjoint_eig(covar)
Lambda = tf.linalg.diag(Lambda)

# calculate PCA-whitening matrix 1/sqrt(L) * D^T
U = K.batch_dot(1. / K.sqrt(Lambda), D, axes=(2,2))
# calculate PCA-whitened activation x_a = U(x - \mu)
x_a = K.batch_dot(U, centered_inputs,axes=(2,1))
# calculate ZCA-whitened output Dx_a
x_whitened = K.batch_dot(D, x_a)
#^^^^^ERROR SOMEWHERE IN HERE^^^^^# 

# reshape whitened activations back to input dimension
x_normed = K.permute_dimensions(x_whitened,(1,0)) # permute back to (bhw X c)
x_normed = K.reshape(x_normed,((-1,), input_shape[1:])) # reroll dimensions

Solution

Suppose you have your code performed by a Keras layer, either a custom layer or a Lambda layer.

The best way I found to debug things was to create a very little model with only that layer to see what it outputs.

If the problem lies within the code, then I gradually move the return statement up to a point where I believe the error is.

debugModel = Sequential()
debugModel.add(MyCustomLayer(...., input_shape=some_shape))

Create dummy or useful data:

data = loadOrCreateSomeData()

Or get the data from the previous layer with a submodel:

subModel = Model(oldModel.inputs, oldModel.get_layer(nameOfATargetLayer).outputs)
data = subModel.predict(inputData)

After having suitable data for the test:

result = debugModel.predict(data)

Some comments about your code:

Ungrouped dimensions

In the following lines, you are inverting the dimensions in reshape, which often messes up your data completely, as dimensions lose meaning. (You're not doing a proper transpose, you're just regrouping numbers in a different way)

pool_shape = (input_shape[-1], np.prod(input_shape[1:-1])*self.batch_size) 
x = K.reshape(x,pool_shape)

I suppose you should be trying this:

pool_shape = (-1, input_shape[-1])
x = K.reshape(x,pool_shape)

And maybe this:

x = K.permute_dimensions(x, (1,0)) #if you do want to invert the dimensions