python tensorflow machine-learning artificial-intelligence conv-neural-network

Tensorflow ValueError: Operands could not be broadcast together with shapes (5, 5, 160) (19, 19, 80)

I was creating a CNN with a size of 80 for the first hidden layer, 160 for the rest of the conv layers, and 128 for the last hidden layer. But I keep running into an error message and I don't really know what it means. The input data shape is (80, 80, 1) which is what I feed into the neural network.

Here is the code to create the CNN:

    if start_model is not None:
        model = load_model(start_model)
    else:
        def res_net_block(input_layers, conv_size, hm_filters, hm_strides):
            x = Conv2D(conv_size, kernel_size=hm_filters, strides=hm_strides, activation="relu", padding="same")(input_layers)
            x = BatchNormalization()(x)
            x = Conv2D(conv_size, kernel_size=hm_filters, strides=hm_strides, activation=None, padding="same")(x)
            x = Add()([x, input_layers])  # Creates resnet block
            x = Activation("relu")(x)
            return x

        input = keras.Input(i_shape)
        x = Conv2D(80, kernel_size=8, strides=4, activation="relu")(input)
        x = BatchNormalization()(x)

        for i in range(3):
            x = res_net_block(x, 160, 4, 2)

        x = Conv2D(160, kernel_size=4, strides=2, activation="relu")(x)
        x = BatchNormalization()(x)

        x = Flatten(input_shape=(np.prod(window_size), 1, 1))(x)

        x = Dense(128, activation="relu")(x)

        output = Dense(action_space_size, activation="linear")(x)

        model = keras.Model(input, output)

        model.compile(optimizer=Adam(lr=0.01), loss="mse", metrics=["accuracy"])

BTW the error message is located at x = Add()([x, input_layers]) in the code

Solution

If you apply a convolution with kernel_size > 1 and strides > 1 the output is going to have a smaller dimension than the input.

For example:

Conv2D(filters=6, kernel_size=5, stride=2)

Would take an input of dimension (32,32,1) and give an output of dimension (28,28,6). This causes a problem if you try to add this to a ResNet style shortcut block because it isn't clear how add to tensors of different dimensions.

There are several ways to deal with this.

Do not reduce the dimension from the convolution (keep stride=1)
Reduce the size of the shortcut block by using a 1x1 convolution kernel with the same stride as used in Conv2D
Change the number of output channels of the shortcut block to be the same as the number of filters in Conv2D