Keras(FIT_GENERATOR)- Error, when checking target: expected activation_1 to have 3 dimensions, but got array with shape (32, 416, 608, 3)

I've been working on a segmentation problem for many days and after finally finding out how to properly read the dataset, I ran into this problem:

ValueError: Error when checking target: expected activation_1(Softmax) to have 3 dimensions, but got array with shape

(32, 416, 608, 3)

I used the functional API, since I took the FCNN architecture from [here](https://github.com/divamgupta/image-segmentation-keras/blob/master/Models/FCN32.py).

It is slightly modified and adapted in accordance with my task(IMAGE_ORDERING = "channels_last"(TensorFlow backend)). Can anyone please help me? Massive thanks in advance. The architecture below is for FCNN, which I try to implement for the purpose of the segmentation. Here is the architecture(after calling model.summary()):

The specific error is:
"Importing the dataset" function:

"Fit_Generator method calling":

 img_input = Input(shape=(input_height,input_width,3))

 #Block 1
 x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', data_format=IMAGE_ORDERING)(img_input) 
 x = BatchNormalization()(x)
 x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', data_format=IMAGE_ORDERING)(x)
 f1 = x
 # Block 2
 x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool', data_format=IMAGE_ORDERING )(x)
 f2 = x

 # Block 3
 x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool', data_format=IMAGE_ORDERING )(x)
 f3 = x

 # Block 4
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2',data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3',data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool', data_format=IMAGE_ORDERING)(x)
 f4 = x

 # Block 5
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2',data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3', data_format=IMAGE_ORDERING)(x)
 x = BatchNormalization()(x)
 x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)
 f5 = x

 x = (Convolution2D(4096,(7,7) , activation='relu' , padding='same', data_format=IMAGE_ORDERING))(x)
 x = Dropout(0.5)(x)
 x = (Convolution2D(4096,(1,1) , activation='relu' , padding='same',data_format=IMAGE_ORDERING))(x)
 x = Dropout(0.5)(x)

 #First parameter = number of classes+1 (de la background)
 x = (Convolution2D(20,(1,1) ,kernel_initializer='he_normal' ,data_format=IMAGE_ORDERING))(x)
 x = Convolution2DTranspose(20,kernel_size=(64,64), strides=(32,32),use_bias=False,data_format=IMAGE_ORDERING)(x)
 o_shape = Model(img_input,x).output_shape

 outputHeight = o_shape[1]
 print('Output Height is:', outputHeight)
 outputWidth = o_shape[2]
 print('Output Width is:', outputWidth)
 #https://keras.io/layers/core/#reshape
 x = (Reshape((20,outputHeight*outputWidth)))(x)
 #https://keras.io/layers/core/#permute
 x = (Permute((2, 1)))(x)
 print("Output shape before softmax is", o_shape)
 x = (Activation('softmax'))(x)
 print("Output shape after softmax is", o_shape)
 model = Model(inputs = img_input,outputs = x)
 model.outputWidth = outputWidth
 model.outputHeight = outputHeight
 model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics =['accuracy'])

Solution

The original code in the FCNN architecture example works with an input dimension of (416, 608). Whereas in your code, the input dimension is (192, 192) (ignoring the channel dimension). Now if you notice carefully, this particular layer

x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)

generates an output of dimension (6, 6) (you can verify in your model.summary()).

The next convoltuion layer

o = (Convolution2D(4096,(7,7) , activation='relu' , padding='same', data_format=IMAGE_ORDERING))(o)

uses convolution filters of size (7, 7), but your input has already reduced to a size smaller than that (i.e. (6, 6)). Try fixing that first.

Also if you look at the model.summary() output, you'll notice that it does not contain the layers defined after the block5_pool layer. There is a transposed convolution layer in it (which basically upsamples your input). You may want to take a look and try to resolve that as well.

NOTE: In all my dimensions, I have ignored the channel dimension.

EDIT Detailed Answer below

First of all, this is my keras.json file. It uses Tensorflow backend, with image_ordering set at channel_last.

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

Next, I copy paste my exact model code. Please take special note of the inline comments in the code below.

from keras.models import *
from keras.layers import *

IMAGE_ORDERING = 'channels_last' # In consistency with the json file

def getFCN32(nb_classes = 20, input_height = 416, input_width = 608):

    img_input = Input(shape=(input_height,input_width, 3)) # Expected input will have channel in the last dimension

    #Block 1
    x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', data_format=IMAGE_ORDERING)(img_input) 
    x = BatchNormalization()(x)
    x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', data_format=IMAGE_ORDERING)(x)
    f1 = x
    # Block 2
    x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool', data_format=IMAGE_ORDERING )(x)
    f2 = x

    # Block 3
    x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool', data_format=IMAGE_ORDERING )(x)
    f3 = x

    # Block 4
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2',data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3',data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool', data_format=IMAGE_ORDERING)(x)
    f4 = x

    # Block 5
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2',data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3', data_format=IMAGE_ORDERING)(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)
    f5 = x

    x = (Convolution2D(4096,(7,7) , activation='relu' , padding='same', data_format=IMAGE_ORDERING))(x)
    x = Dropout(0.5)(x)
    x = (Convolution2D(4096,(1,1) , activation='relu' , padding='same',data_format=IMAGE_ORDERING))(x)
    x = Dropout(0.5)(x)

    x = (Convolution2D(20,(1,1) ,kernel_initializer='he_normal' ,data_format=IMAGE_ORDERING))(x)
    x = Convolution2DTranspose(20,kernel_size=(64,64), strides=(32,32),use_bias=False,data_format=IMAGE_ORDERING)(x)
    o_shape = Model(img_input, x).output_shape

    # NOTE: Since this is channel last dimension ordering, the height and width dimensions are along [1] and [2], not [2] and [3]
    outputHeight = o_shape[1]
    outputWidth = o_shape[2]

    x = (Reshape((outputHeight*outputWidth, 20)))(x) # Channel should be along the last dimenion of reshape
    # No need of permute layer anymore

    print("Output shape before softmax is", o_shape)
    x = (Activation('softmax'))(x)
    print("Output shape after softmax is", o_shape)
    model = Model(inputs = img_input,outputs = x)
    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics =['accuracy'])

    return model

model = getFCN32(20)
print(model.summary())

Next I will provide with snippets of how my model.summary() looks. If you take a look at the last few layers, it is something like this:

So this means, the conv2d_transpose layer produces an output of dimension (448, 640, 20) and flattens it out before applying softmax on it. So dimension of output is (286720, 20). Similarly your target_generator (mask_generator in your case) should also generate targets of similar dimension. Similarly, your input_generator should also be producing input batches of size [batch size, input_height,input_width, 3], as mentioned in the img_input of your function.

Hopefully this will help you to get to the bottom of your problem and figure out a suitable solution. Please take a look at the minor variations in the code (along with the in-line comments) and how to create your input and target batches.