regarding building the first input layer in VGG16 using Keras

In this blog, the author includes a code segment to build VGG16 network. I have some question regarding the following part of the code

model = Sequential()
model.add(ZeroPadding2D((1, 1), batch_input_shape=(1, 3, img_width, img_height)))
first_layer = model.layers[-1]
# this is a placeholder tensor that will contain our generated images
input_img = first_layer.input

Related to model.add(ZeroPadding2D((1, 1), batch_input_shape=(1, 3, img_width, img_height))), is it always true that we normally use ZeroPadding2D to build the first layer reading image as input? What does (1,1) indicate for the input parameter of ZeroPadding2D.According to the Keras document, it means that we add 1 zero for both row and column. How to decide how many zeros to add?

Secondly, why do we need to set -1 in first_layer = model.layers[-1]? Here we only have one layer, should it be 0 instead?

Solution

is it always true that we normally use ZeroPadding2D to build the first layer reading image as input?

Depends. In this particular code, the author intend to perform a 3x3 convolution that outputs image features with same width and height as input image. This is often the case if input image size is power of 2, because you want to keep the number for the 2x2 pooling layers.

Without padding:

128x128 -[3x3 conv]-> 126x126 -[2x2 pool]-> 63x63 -[3x3 conv]-> 61x61 -> *how to pool next?*

With padding:

128x128 -[pad 1]-> 130x130 -[3x3 conv]-> 128x128 -[2x2 pool]-> 64x64
-[pad+conv+pool]-> 32x32 -[...]-> 16x16 -> 8x8 ...

What does (1,1) indicate for the input parameter of ZeroPadding2D?

If input image is 128*128, the (1,1) zero padding will create a 130x130 image that adds a 1 pixel wide black frame. (1,1) means how many pixels to add at horizonal/vertical edges respectively.

           o o o o o
x x x      o x x x o
x x x  ->  o x x x o
x x x      o x x x o
           o o o o o

If you intent to keep image dimensions using 5x5 convolution, you would need a (2,2) padding.

why do we need to set -1 in first_layer = model.layers[-1]?

It's okay to use exact indexing. However, if someday you decide to add a preprocessing layer below the first convolution layer, you don't need to change the [-1] index as it always gives the topmost layer. Less bugs in case you forgot.