python keras deep-learning conv-neural-network semantic-segmentation

Last convoulutional layer in U-net architecure is expecting wrong dimention

I am trying to implement u-net in Keras,but I got this error while training the model(call model.fit()):

ValueError: Error when checking target: expected conv2d_302 to have shape > (None, 1, 128, 640) but got array with shape (360, 1, 128, 128)

And the output of the model.summary() is :

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_19 (InputLayer)           (None, 1, 128, 128)  0                                            
__________________________________________________________________________________________________
conv2d_303 (Conv2D)             (None, 32, 128, 128) 320         input_19[0][0]                   
__________________________________________________________________________________________________
conv2d_304 (Conv2D)             (None, 32, 128, 128) 9248        conv2d_303[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_83 (MaxPooling2D) (None, 32, 64, 64)   0           conv2d_304[0][0]                 
__________________________________________________________________________________________________
conv2d_305 (Conv2D)             (None, 64, 64, 64)   18496       max_pooling2d_83[0][0]           
__________________________________________________________________________________________________
conv2d_306 (Conv2D)             (None, 64, 64, 64)   36928       conv2d_305[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_84 (MaxPooling2D) (None, 64, 32, 32)   0           conv2d_306[0][0]                 
__________________________________________________________________________________________________
conv2d_307 (Conv2D)             (None, 128, 32, 32)  73856       max_pooling2d_84[0][0]           
__________________________________________________________________________________________________
conv2d_308 (Conv2D)             (None, 128, 32, 32)  147584      conv2d_307[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_85 (MaxPooling2D) (None, 128, 16, 16)  0           conv2d_308[0][0]                 
__________________________________________________________________________________________________
conv2d_309 (Conv2D)             (None, 256, 16, 16)  295168      max_pooling2d_85[0][0]           
__________________________________________________________________________________________________
conv2d_310 (Conv2D)             (None, 256, 16, 16)  590080      conv2d_309[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_86 (MaxPooling2D) (None, 256, 8, 8)    0           conv2d_310[0][0]                 
__________________________________________________________________________________________________
conv2d_311 (Conv2D)             (None, 512, 8, 8)    1180160     max_pooling2d_86[0][0]           
__________________________________________________________________________________________________
conv2d_312 (Conv2D)             (None, 512, 8, 8)    2359808     conv2d_311[0][0]                 
__________________________________________________________________________________________________
conv2d_transpose_29 (Conv2DTran (None, 256, 16, 16)  524544      conv2d_312[0][0]                 
__________________________________________________________________________________________________
concatenate_29 (Concatenate)    (None, 256, 16, 32)  0           conv2d_transpose_29[0][0]        
                                                                 conv2d_310[0][0]                 
__________________________________________________________________________________________________
conv2d_313 (Conv2D)             (None, 256, 16, 32)  590080      concatenate_29[0][0]             
__________________________________________________________________________________________________
conv2d_314 (Conv2D)             (None, 256, 16, 32)  590080      conv2d_313[0][0]                 
__________________________________________________________________________________________________
conv2d_transpose_30 (Conv2DTran (None, 128, 32, 64)  131200      conv2d_314[0][0]                 
__________________________________________________________________________________________________
concatenate_30 (Concatenate)    (None, 128, 32, 96)  0           conv2d_transpose_30[0][0]        
                                                                 conv2d_308[0][0]                 
__________________________________________________________________________________________________
conv2d_315 (Conv2D)             (None, 128, 32, 96)  147584      concatenate_30[0][0]             
__________________________________________________________________________________________________
conv2d_316 (Conv2D)             (None, 128, 32, 96)  147584      conv2d_315[0][0]                 
__________________________________________________________________________________________________
conv2d_transpose_31 (Conv2DTran (None, 64, 64, 192)  32832       conv2d_316[0][0]                 
__________________________________________________________________________________________________
concatenate_31 (Concatenate)    (None, 64, 64, 256)  0           conv2d_transpose_31[0][0]        
                                                                 conv2d_306[0][0]                 
__________________________________________________________________________________________________
conv2d_317 (Conv2D)             (None, 64, 64, 256)  36928       concatenate_31[0][0]             
__________________________________________________________________________________________________
conv2d_318 (Conv2D)             (None, 64, 64, 256)  36928       conv2d_317[0][0]                 
__________________________________________________________________________________________________
conv2d_transpose_32 (Conv2DTran (None, 32, 128, 512) 8224        conv2d_318[0][0]                 
__________________________________________________________________________________________________
concatenate_32 (Concatenate)    (None, 32, 128, 640) 0           conv2d_transpose_32[0][0]        
                                                                 conv2d_304[0][0]                 
__________________________________________________________________________________________________
conv2d_319 (Conv2D)             (None, 32, 128, 640) 9248        concatenate_32[0][0]             
__________________________________________________________________________________________________
conv9 (Conv2D)                  (None, 32, 128, 640) 9248        conv2d_319[0][0]                 
__________________________________________________________________________________________________
conv2d_320 (Conv2D)             (None, 1, 128, 640)  33          conv9[0][0]                      
==================================================================================================
Total params: 6,976,161
Trainable params: 6,976,161
Non-trainable params: 0

Here is the model code:

img_rows=128
img_cols= 128
inputs = Input((1, img_rows, img_cols))
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool3)
conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool4)
conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv5)

up6 = concatenate([Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(conv5), conv4], axis=3)
conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(up6)
conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv6)

up7 = concatenate([Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv6), conv3], axis=3)
conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(up7)
conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv7)

up8 = concatenate([Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv7), conv2], axis=3)
conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(up8)
conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv8)

up9 = concatenate([Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv8), conv1], axis=3)
conv9 = Conv2D(32, (3, 3), activation='relu', padding='same')(up9)
conv9 = Conv2D(32, (3, 3), activation='relu', padding='same', name='conv9')(conv9)

conv10 = Conv2D(1, (1, 1), activation='sigmoid')(conv9)

model = Model(inputs=[inputs], outputs=[conv10])

model.compile(optimizer=Adam(lr=1e-5), loss="mean_absolute_error")
model.summary()
model.fit(X_train, y_train, batch_size=36, nb_epoch=5)

I don't understand why the output of the second last layer(conv9) is different from the expectation of the last layer(conv10).

The Keras model is a courtesy of https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py .

Updated: Added the complete model.summary().

Solution

It is most certain that the author of the original wanted to concatenate on the channels dimension, not one of the image dimensions.

The tensors in convolutional networks could be in one of the two formats:

(batch_size, width, height, channels)

(batch_size, channels, width, height)

In the model you linked the first format was used, but your model uses the second format.

You can fix it in one of the two ways:

Change axis=3 in the concatente layers to axis=1

Set data_format="channels_last" in the convolutional layers. The default value of data_format, if omitted, is taken from a keras config, and most likely this value was different for you and for the author of the model you used. See https://keras.io/layers/convolutional/#conv2d

UPDATE: as a matter of fact, the original model changes the data_format at the very beginning of the file you linked to:

K.set_image_data_format('channels_last')

Just add this line at the beginning of your model, and it will solve the issue.