Search code examples
pythontensorflowmachine-learningartificial-intelligenceconv-neural-network

Tensorflow ValueError: Operands could not be broadcast together with shapes (5, 5, 160) (19, 19, 80)


I was creating a CNN with a size of 80 for the first hidden layer, 160 for the rest of the conv layers, and 128 for the last hidden layer. But I keep running into an error message and I don't really know what it means. The input data shape is (80, 80, 1) which is what I feed into the neural network.

Here is the code to create the CNN:

    if start_model is not None:
        model = load_model(start_model)
    else:
        def res_net_block(input_layers, conv_size, hm_filters, hm_strides):
            x = Conv2D(conv_size, kernel_size=hm_filters, strides=hm_strides, activation="relu", padding="same")(input_layers)
            x = BatchNormalization()(x)
            x = Conv2D(conv_size, kernel_size=hm_filters, strides=hm_strides, activation=None, padding="same")(x)
            x = Add()([x, input_layers])  # Creates resnet block
            x = Activation("relu")(x)
            return x

        input = keras.Input(i_shape)
        x = Conv2D(80, kernel_size=8, strides=4, activation="relu")(input)
        x = BatchNormalization()(x)

        for i in range(3):
            x = res_net_block(x, 160, 4, 2)

        x = Conv2D(160, kernel_size=4, strides=2, activation="relu")(x)
        x = BatchNormalization()(x)

        x = Flatten(input_shape=(np.prod(window_size), 1, 1))(x)

        x = Dense(128, activation="relu")(x)

        output = Dense(action_space_size, activation="linear")(x)

        model = keras.Model(input, output)

        model.compile(optimizer=Adam(lr=0.01), loss="mse", metrics=["accuracy"])

BTW the error message is located at x = Add()([x, input_layers]) in the code


Solution

  • If you apply a convolution with kernel_size > 1 and strides > 1 the output is going to have a smaller dimension than the input.

    For example:

    Conv2D(filters=6, kernel_size=5, stride=2)
    

    Would take an input of dimension (32,32,1) and give an output of dimension (28,28,6). This causes a problem if you try to add this to a ResNet style shortcut block because it isn't clear how add to tensors of different dimensions.

    There are several ways to deal with this.

    • Do not reduce the dimension from the convolution (keep stride=1)
    • Reduce the size of the shortcut block by using a 1x1 convolution kernel with the same stride as used in Conv2D
    • Change the number of output channels of the shortcut block to be the same as the number of filters in Conv2D