I have been studying CNNs for a while and don't know well about it yet. So I inserted what I thought important.
I have a dataset of hand gestures containing 1400 images with 10 classes. I am building a CNN model in keras in spyder IDE. The sequential is below.
model = Sequential()
model.add(Convolution2D(32, 3,3,border_mode='same', input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
I trained it with 30 epochs and gained:
Test Loss: 0.260991449015
Test accuracy: 0.928571430274
precision recall f1-score support
class 0 1.00 0.93 0.96 28
class 1 0.96 0.96 0.96 26
class 2 0.92 1.00 0.96 24
class 3 0.72 0.87 0.79 30
class 4 0.97 0.97 0.97 35
class 5 0.90 0.93 0.92 29
class 6 0.93 1.00 0.97 28
class 7 1.00 0.97 0.98 33
class 8 1.00 0.95 0.97 19
class 9 0.95 0.71 0.82 28
avg / total 0.93 0.93 0.93 280
Confusion matrix, without normalization
[[26 0 0 0 1 0 1 0 0 0]
[ 0 25 1 0 0 0 0 0 0 0]
[ 0 0 24 0 0 0 0 0 0 0]
[ 0 0 1 26 0 3 0 0 0 0]
[ 0 1 0 0 34 0 0 0 0 0]
[ 0 0 0 1 0 27 1 0 0 0]
[ 0 0 0 0 0 0 28 0 0 0]
[ 0 0 0 0 0 0 0 32 0 1]
[ 0 0 0 1 0 0 0 0 18 0]
[ 0 0 0 8 0 0 0 0 0 20]]
Q1: Is this model doing well? Q2: Am I overfitting? Q3: How can I model CNN with the best possible way?
Thank you for your time
Considering your confusion matrix of the test dataset is having very high numbers in diagonal part of matrix, and almost zero elsewhere, it is an indication that your model has fitted properly (except in one case where you are having entry of 8 in your confusion matrix).
But looking into your dataset as you mentioned, it contains only 1400 images with 10 classes, which is 140 images per class on an average. 140 examples is not a really good number for the neural network to generalize. I am not sure, how much diversity is there in your dataset and how would you like your network to be deployed in production. For example, consider your dataset is having all images with green color background and directly only hand image is present. If while testing, you somehow can simulate this behavior, i.e green color background and only hand image, then your network might perform well in production. But imagine if this type of conditions are not getting simulated and you deploy your model in production, then your model is very likely to perform badly.
To add diversity in your dataset, you can make use of ImageDataGenerator and simulate various types of distortions so that your network learns more of required features.