I am currently working on CNN on a image for feature extraction using keras. All the images as 276 rows, x columns and 3 color dimensions (RGB). The number of columns is equal to the length of the output feature vector it should generate.
Input data representation - edit:
The input data given to the image consist of columnwise slices of the image. which means the actual input to the image is (276,3) and number of of columns is equal to the feature length it should generate.
My initial model is as such:
print "Model Definition"
model = Sequential()
model.add(Convolution2D(64,row,1,input_shape=(row,None,3)))
print model.output_shape
model.add(MaxPooling2D(pool_size=(1,64)))
print model.output_shape
model.add(Dense(1,activation='relu'))
My prints in between prints the output.shape
,and I seem to be bit confused on the output.
Model Definition
(None, 1, None, 64)
(None, 1, None, 64)
How come is the 3D data become 4d? And keeps being that after the maxpoolling2d layer?.
My dense layer/fully-connected layer is giving me some problems with the dimensions here:
Traceback (most recent call last):
File "keras_convolutional_feature_extraction.py", line 466, in <module>
model(0,train_input_data,output_data_train,test_input_data,output_data_test)
File "keras_convolutional_feature_extraction.py", line 440, in model
model.add(Dense(1,activation='relu'))
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 474, in __call__
self.assert_input_compatibility(x)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 415, in assert_input_compatibility
str(K.ndim(x)))
Exception: Input 0 is incompatible with layer dense_1: expected ndim=2, found ndim=4
So why am i not able to get the data down to a 1 single value from a 3D image. ?
You are operating on a 276 x None x 3
image using 64 convolutional filters, each of size 276 x 1
(assuming rows = 276
). One convolutional filter will output a matrix of size 1 x None
. Read this in detail if you do not know how convolutional filters work. So for 64 filters, (in Theano backend) you will get a matrix of size 64 x 1 x None
. In Tensorflow backend, I think it will be 1 x None x 64
. Now, first dimension for Keras-Theano is always samples. So, your final output shape will be None x 64 x 1 x None
. For Tensorflow, it will be None x 1 x None x 64
. Read this for more information on different backends in Keras.
To remove the dense layer error, I think you will need to flatten the output by introducing the following line before adding Dense
layer.
model.add(Flatten())
However, I do not really understand the use of dense layer here. As you must be aware, the dense layer only accepts a fixed input size and provides a fixed size output. So your None
dimension will be basically restricted to a single value if you want your network to run without throwing errors. If you want to have an output of the shape 1 x None
, then you should not include dense layers and use average
pooling at the end to collapse the response to a 1 x 1 x None
output.
Edit: If you have an image of size 276 x n x 3
, where it has variable number of columns and if you want an output of size 1 x n
, then you can do as follows:
model = Sequential()
model.add(Convolution2D(64,row,1,input_shape=(row,None,3)))
model.add(Convolution2D(1,1,1))
print model.output_shape # this should print `None x 1 x None x 1`
model.add(flatten())
Now, I doubt this network will perform very well since it has only one layer of 64 filters. The receptive field is also too large (e.g. 276 - height of the image). You can do two things:
In the following, I will assume that the image height is 50. Then you can write a network as follows:
model = Sequential()
model.add(Convolution2D(32,3,1,activation='relu',
init='he_normal',input_shape=(row,None,3))) # row = 50
model.add(Convolution2D(32,3,1,activation='relu',init='he_normal'))
model.add(MaxPooling2D(pool_size=(2,1), strides=(2,1), name='pool1'))
model.add(Convolution2D(64,3,1,activation='relu',init='he_normal'))
model.add(Convolution2D(64,3,1,activation='relu',init='he_normal'))
model.add(MaxPooling2D(pool_size=(2,1), strides=(2,1), name='pool2'))
model.add(Convolution2D(128,3,1,activation='relu',init='he_normal'))
model.add(Convolution2D(128,3,1,activation='relu',init='he_normal'))
model.add(Convolution2D(128,3,1,activation='relu',init='he_normal'))
model.add(MaxPooling2D(pool_size=(2,1), strides=(2,1), name='pool3'))
model.add(Convolution2D(1,1,1), name='squash_channels')
print model.output_shape # this should print `None x 1 x None x 1`
model.add(flatten(), name='flatten_input')
You should verify that all these convolutional and max-pooling layers are reducing the input height from 50 to 1 after the last max-pooling.
How to handle variable-sized images
One way is to first determine a common size for your dataset, e.g. 224. Then construct the network for 224 x n
image as shown above (maybe a little deeper). Now let us say you get an image with a different size, say, p x n'
where p > 224
and n' != n
. You can take a center-crop of image of size 224 x n'
and pass it through the image. You have your feature vector.
If you think that majority of the information is not concentrated around the center, then you can take multiple crops, and then average (or max-pool) the multiple feature vector obtained. Using these methods, I think you should be able to handle variable-sized inputs.
Edit:
See the CNN that I defined using 3 x 3
convolutions. Assume that the input is of size 50 x n x 3
. Let us say that we pass an input of size p x q x r
through a convolutional layer which has f
filters, each of size 3 x 3
, stride 1. The input has no padding. Then the output of convolutional layer will be of size (p-2) x (q-2) x f
i.e. the output height and width will be the two less than that of input. Our pooling layers are of size (2,1)
and stride (2,1)
. They will halve the input in the y-direction (or halve the image height). Keeping this in mind, the following is straightforward to derive (observe the layer names I have given in my CNN, they are referenced below).
CNN input: None x 50 x n x 3
Input of pool1
layer: None x 46 x n x 32
Output of pool1
layer: None x 23 x n x 32
Input of pool2
layer: None x 19 x n x 64
Output of pool2
layer: None x 9 x n x 64
(I think Keras pooling takes floor i.e. floor(19/2) = 9)
Input of pool3
layer: None x 3 x n x 128
Output of pool3
layer: None x 1 x n x 128
Input of squash_channels
layer: None x 1 x n x 128
Output of squash_channels
layer: None x 1 x n x 1
Input of flatten_input
layer: None x 1 x n x 1
Output of flatten_input
layer: None x n
I think this is what you wanted. I hope its clear now.