Calculation of Keras layers output dimensions

I am currently trying to implement GoogLeNet architecture (InceptionV1) in Keras using theano backend, as I want to generate features for CUB dataset using GoogLeNet model.

I found an implementation in Keras here.

However, it is based on the earlier version of Keras and I had to make changes in the layers as per Keras version 2.

Now, the model is getting built correctly. However, the predict() function is failing with the error as

ValueError: CorrMM images and kernel must have the same stack size

So, I started looking at the original paper and correlating the layers mentioned in the paper with the implemented one.

So, here I found first layer to have output as expected as 112x112x64 with the input as 224x224x3.

However, when I tried to calculate the expected output dimensions as per the formula given in Stanford University tutorial page, it is different from the actual output which I received from the Keras code, though this is what is the expected output as per the GoogLeNet paper. i.e. as per the formula mentioned on the Stanford page Output height or length = ((Input height or length - filter size + 2 * Padding) / Stride) + 1

As per above equation, the output dimension comes in fraction which is not valid and to get the expected dimension as per the formula, input needs to be of shape 227x227x3. However, in Keras, with this input, output comes as 114x114x64.

Does Keras calculate the output dimensions in some different way or am I missing out on something?

Solution

Somehow I could make it work yesterday by removing few lines of code from the model which was making it to change the dimensions. (Possibly it was required by earlier version of Keras and Theano)

Also, contrary to the one mentioned in the paper, I changed patch size of MaxPooling2D() function from 3x3 to 2x2 which is the only way to achieve the desired output dimensions in GoogLeNet architecture. With input shape 224x224 and applying max pooling with patch size 2x2 and stride 2x2, its dimensions gets halved and we can get the desired output shape.

I am not sure why equation of output dimensions based on input, filter, padding and stride as parameters are not applicable here.