I trained my own model in keras on mnist. I only got conv2d layers because I want to train the network on small images (mnist: 28x28 px) and later do the inference on large images 1920x1080.
My shape (for training):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv2D) (None, 28, 28, 64) 640
_________________________________________________________________
batch_normalization_117 (Bat (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_117 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d_119 (MaxPoolin (None, 14, 14, 64) 0
_________________________________________________________________
conv2 (Conv2D) (None, 14, 14, 128) 73856
_________________________________________________________________
batch_normalization_118 (Bat (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_118 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
max_pooling2d_120 (MaxPoolin (None, 7, 7, 128) 0
_________________________________________________________________
conv3 (Conv2D) (None, 7, 7, 256) 295168
_________________________________________________________________
batch_normalization_119 (Bat (None, 7, 7, 256) 1024
_________________________________________________________________
leaky_re_lu_119 (LeakyReLU) (None, 7, 7, 256) 0
_________________________________________________________________
max_pooling2d_121 (MaxPoolin (None, 4, 4, 256) 0
_________________________________________________________________
conv4 (Conv2D) (None, 4, 4, 128) 295040
_________________________________________________________________
batch_normalization_120 (Bat (None, 4, 4, 128) 512
_________________________________________________________________
leaky_re_lu_120 (LeakyReLU) (None, 4, 4, 128) 0
_________________________________________________________________
max_pooling2d_122 (MaxPoolin (None, 2, 2, 128) 0
_________________________________________________________________
conv5 (Conv2D) (None, 1, 1, 10) 5130
=================================================================
Total params: 672,138
Trainable params: 670,986
Non-trainable params: 1,152
Shape for inference:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv2D) (None, 1920, 1080, 64) 640
_________________________________________________________________
batch_normalization_113 (Bat (None, 1920, 1080, 64) 256
_________________________________________________________________
leaky_re_lu_113 (LeakyReLU) (None, 1920, 1080, 64) 0
_________________________________________________________________
max_pooling2d_115 (MaxPoolin (None, 960, 540, 64) 0
_________________________________________________________________
conv2 (Conv2D) (None, 960, 540, 128) 73856
_________________________________________________________________
batch_normalization_114 (Bat (None, 960, 540, 128) 512
_________________________________________________________________
leaky_re_lu_114 (LeakyReLU) (None, 960, 540, 128) 0
_________________________________________________________________
max_pooling2d_116 (MaxPoolin (None, 480, 270, 128) 0
_________________________________________________________________
conv3 (Conv2D) (None, 480, 270, 256) 295168
_________________________________________________________________
batch_normalization_115 (Bat (None, 480, 270, 256) 1024
_________________________________________________________________
leaky_re_lu_115 (LeakyReLU) (None, 480, 270, 256) 0
_________________________________________________________________
max_pooling2d_117 (MaxPoolin (None, 240, 135, 256) 0
_________________________________________________________________
conv4 (Conv2D) (None, 240, 135, 128) 295040
_________________________________________________________________
batch_normalization_116 (Bat (None, 240, 135, 128) 512
_________________________________________________________________
leaky_re_lu_116 (LeakyReLU) (None, 240, 135, 128) 0
_________________________________________________________________
max_pooling2d_118 (MaxPoolin (None, 120, 68, 128) 0
_________________________________________________________________
conv5 (Conv2D) (None, 119, 67, 10) 5130
=================================================================
Total params: 672,138
Trainable params: 670,986
Non-trainable params: 1,152
Goal here is to create a convolved image with the dimensions of my output classes, which represent the sliding windows in my large image for inference.
But keras will not let me train, because in the last layer it will reduce the shape of my of the previos layers output(from (batch,x,y,channels) to (batch,channels)):
ValueError: Error when checking target: expected conv5 to have 4 dimensions, but got array with shape (48000, 10)
The shape needs to be (48000, 1, 1, 10) !!! What can i do to prevent this? When I introduce flatten and dense, I can not use it later for inference on big images?
Thanks for your time and help.
To be able to train and test on different input sizes, there are two things you should do:
None
as the input dimension.GlobalAveragePooling2D
with Conv2D
layers with a filter size equal to the number of categories.The following sample code can create a model to train and do inference on images with any input size (given that maxpooling and striding does not lead to negative dimensions).
from keras import layers, Model
my_input = layers.Input(shape=(None, None, 1))
x = layers.Conv2D(filters=32, kernel_size=3, strides=1)(my_input)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D()(x)
x = layers.Conv2D(filters=64, kernel_size=3, strides=1)(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D()(x)
out = layers.Conv2D(filters=10, kernel_size=1, strides=1)(x)
out = layers.GlobalAveragePooling2D()(out)
out = layers.Activation('softmax')(out)
model = Model(my_input, out)
model.summary()
The model summary prints this:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, None, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, None, None, 32) 320
_________________________________________________________________
batch_normalization_1 (Batch (None, None, None, 32) 128
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, None, None, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, None, None, 64) 18496
_________________________________________________________________
batch_normalization_2 (Batch (None, None, None, 64) 256
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, None, None, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, None, None, 10) 650
_________________________________________________________________
global_average_pooling2d_1 ( (None, 10) 0
_________________________________________________________________
activation_1 (Activation) (None, 10) 0
=================================================================
Total params: 19,850
Trainable params: 19,658
Non-trainable params: 192