python tensorflow keras conv-neural-network keras-layer

Train network in keras consisting only of conv2d layers

I trained my own model in keras on mnist. I only got conv2d layers because I want to train the network on small images (mnist: 28x28 px) and later do the inference on large images 1920x1080.

My shape (for training):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1 (Conv2D)               (None, 28, 28, 64)        640       
_________________________________________________________________
batch_normalization_117 (Bat (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu_117 (LeakyReLU)  (None, 28, 28, 64)        0         
_________________________________________________________________
max_pooling2d_119 (MaxPoolin (None, 14, 14, 64)        0         
_________________________________________________________________
conv2 (Conv2D)               (None, 14, 14, 128)       73856     
_________________________________________________________________
batch_normalization_118 (Bat (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_118 (LeakyReLU)  (None, 14, 14, 128)       0         
_________________________________________________________________
max_pooling2d_120 (MaxPoolin (None, 7, 7, 128)         0         
_________________________________________________________________
conv3 (Conv2D)               (None, 7, 7, 256)         295168    
_________________________________________________________________
batch_normalization_119 (Bat (None, 7, 7, 256)         1024      
_________________________________________________________________
leaky_re_lu_119 (LeakyReLU)  (None, 7, 7, 256)         0         
_________________________________________________________________
max_pooling2d_121 (MaxPoolin (None, 4, 4, 256)         0         
_________________________________________________________________
conv4 (Conv2D)               (None, 4, 4, 128)         295040    
_________________________________________________________________
batch_normalization_120 (Bat (None, 4, 4, 128)         512       
_________________________________________________________________
leaky_re_lu_120 (LeakyReLU)  (None, 4, 4, 128)         0         
_________________________________________________________________
max_pooling2d_122 (MaxPoolin (None, 2, 2, 128)         0         
_________________________________________________________________
conv5 (Conv2D)               (None, 1, 1, 10)          5130      
=================================================================
Total params: 672,138
Trainable params: 670,986
Non-trainable params: 1,152

Shape for inference:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1 (Conv2D)               (None, 1920, 1080, 64)    640       
_________________________________________________________________
batch_normalization_113 (Bat (None, 1920, 1080, 64)    256       
_________________________________________________________________
leaky_re_lu_113 (LeakyReLU)  (None, 1920, 1080, 64)    0         
_________________________________________________________________
max_pooling2d_115 (MaxPoolin (None, 960, 540, 64)      0         
_________________________________________________________________
conv2 (Conv2D)               (None, 960, 540, 128)     73856     
_________________________________________________________________
batch_normalization_114 (Bat (None, 960, 540, 128)     512       
_________________________________________________________________
leaky_re_lu_114 (LeakyReLU)  (None, 960, 540, 128)     0         
_________________________________________________________________
max_pooling2d_116 (MaxPoolin (None, 480, 270, 128)     0         
_________________________________________________________________
conv3 (Conv2D)               (None, 480, 270, 256)     295168    
_________________________________________________________________
batch_normalization_115 (Bat (None, 480, 270, 256)     1024      
_________________________________________________________________
leaky_re_lu_115 (LeakyReLU)  (None, 480, 270, 256)     0         
_________________________________________________________________
max_pooling2d_117 (MaxPoolin (None, 240, 135, 256)     0         
_________________________________________________________________
conv4 (Conv2D)               (None, 240, 135, 128)     295040    
_________________________________________________________________
batch_normalization_116 (Bat (None, 240, 135, 128)     512       
_________________________________________________________________
leaky_re_lu_116 (LeakyReLU)  (None, 240, 135, 128)     0         
_________________________________________________________________
max_pooling2d_118 (MaxPoolin (None, 120, 68, 128)      0         
_________________________________________________________________
conv5 (Conv2D)               (None, 119, 67, 10)       5130      
=================================================================
Total params: 672,138
Trainable params: 670,986
Non-trainable params: 1,152

Goal here is to create a convolved image with the dimensions of my output classes, which represent the sliding windows in my large image for inference.

But keras will not let me train, because in the last layer it will reduce the shape of my of the previos layers output(from (batch,x,y,channels) to (batch,channels)):

ValueError: Error when checking target: expected conv5 to have 4 dimensions, but got array with shape (48000, 10)

The shape needs to be (48000, 1, 1, 10) !!! What can i do to prevent this? When I introduce flatten and dense, I can not use it later for inference on big images?

Thanks for your time and help.

Solution

To be able to train and test on different input sizes, there are two things you should do:

Introduce None as the input dimension.
Use the GlobalAveragePooling2D with Conv2D layers with a filter size equal to the number of categories.

The following sample code can create a model to train and do inference on images with any input size (given that maxpooling and striding does not lead to negative dimensions).

from keras import layers, Model

my_input = layers.Input(shape=(None, None, 1))

x = layers.Conv2D(filters=32, kernel_size=3, strides=1)(my_input)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D()(x)
x = layers.Conv2D(filters=64, kernel_size=3, strides=1)(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D()(x)

out = layers.Conv2D(filters=10, kernel_size=1, strides=1)(x)
out = layers.GlobalAveragePooling2D()(out)
out = layers.Activation('softmax')(out)
model = Model(my_input, out)
model.summary()

The model summary prints this:

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, None, 1)     0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, None, None, 32)    320       
_________________________________________________________________
batch_normalization_1 (Batch (None, None, None, 32)    128       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, None, None, 32)    0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, None, None, 64)    18496     
_________________________________________________________________
batch_normalization_2 (Batch (None, None, None, 64)    256       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, None, None, 64)    0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, None, None, 10)    650       
_________________________________________________________________
global_average_pooling2d_1 ( (None, 10)                0         
_________________________________________________________________
activation_1 (Activation)    (None, 10)                0         
=================================================================
Total params: 19,850
Trainable params: 19,658
Non-trainable params: 192