deep-learning keras training-data loss convergence

training vgg on flowers dataset with keras, validation loss not changing

I am doing a little experiment on VGG network with keras. The dataset I use is the flowers dataset with 5 classes including rose, sunflower, dandelion, tulip and daisy.

There is something I could not figure out: When I used a small CNN network(not VGG, in the code below), it converged quickly and reached a validation accuracy about 75% after only about 8 epochs.

Then I switched to VGG network(the commented out area in the code). The loss and accuracy of the network just did not change at all, it output something like:

Epoch 1/50 402/401 [==============================] - 199s 495ms/step - loss: 13.3214 - acc: 0.1713 - val_loss: 13.0144 - val_acc: 0.1926

Epoch 2/50 402/401 [==============================] - 190s 473ms/step - loss: 13.3473 - acc: 0.1719 - val_loss: 13.0144 - val_acc: 0.1926

Epoch 3/50 402/401 [==============================] - 204s 508ms/step - loss: 13.3423 - acc: 0.1722 - val_loss: 13.0144 - val_acc: 0.1926

Epoch 4/50 402/401 [==============================] - 190s 472ms/step - loss: 13.3522 - acc: 0.1716 - val_loss: 13.0144 - val_acc: 0.1926

Epoch 5/50 402/401 [==============================] - 189s 471ms/step - loss: 13.3364 - acc: 0.1726 - val_loss: 13.0144 - val_acc: 0.1926

Epoch 6/50 402/401 [==============================] - 189s 471ms/step - loss: 13.3453 - acc: 0.1720 - val_loss: 13.0144 - val_acc: 0.1926 Epoch 7/50

Epoch 7/50 402/401 [==============================] - 189s 471ms/step - loss: 13.3503 - acc: 0.1717 - val_loss: 13.0144 - val_acc: 0.1926

PS: I did this experiment with other datasets and frameworks as well (place365 dataset with tensorflow and slim). The result is just the same. I have looked into the VGG paper(Simonyan&Zisserman), it says there are multiple stages to train a deep network like VGG, like from stage A to stage E with different network structures. I am not sure I have to train my VGG network the same way as it is described in the VGG paper. And other online courses did not mention this complex training process as well. Anyone has any ideas?

My code:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K


# dimensions of our images.
img_width, img_height = 224, 224

train_data_dir = './data/train'
validation_data_dir = './data/val'
nb_train_samples = 3213
nb_validation_samples = 457
epochs = 50
batch_size = 8

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)

# random cnn model: 
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))

# vgg model:
'''model = Sequential([
    Conv2D(64, (3, 3), input_shape=input_shape, padding='same',
           activation='relu'),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    Conv2D(128, (3, 3), activation='relu', padding='same',),
    MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    Conv2D(256, (3, 3), activation='relu', padding='same',),
    Conv2D(256, (3, 3), activation='relu', padding='same',),
    Conv2D(256, (3, 3), activation='relu', padding='same',),
    MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    Conv2D(512, (3, 3), activation='relu', padding='same',),
    MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(256, activation='relu'),
    Dense(5, activation='softmax')
])'''


model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size)

model.save_weights('flowers.h5')

Solution

Problem solved, I changed my learning rate to 0.0001. It starts to learn now. It seems like 0.001 is not small enough.