Shape of tensor for 2D image in Keras

I am a newbie to Keras (and somehow to TF) but I have found shape definition for the input layer very confusing.

So in the examples, when we have a 1D vector of length 20 for input, shape gets defined as

...Input(shape=(20,)...)

And when a 2D tensor for greyscale images needs to be defined for MNIST, it is defined as:

...Input(shape=(28, 28, 1)...)

So my question is why the tensor is not defined as (20) and (28, 28)? Why in the first case a second dimension is added and left empty? Also in second, number of channels have to be defined?

I understand that it depends on the layer so Conv1D, Dense or Conv2D take different shapes but it seems the first parameter is implicit?

According to docs, Dense needs be (batch_size, ..., input_dim) but how is this related the example:

Dense(32, input_shape=(784,))

Thanks

Solution

Tuples vs numbers

input_shape must be a tuple, so only (20,) can satisfy it. The number 20 is not a tuple. -- There is the parameter input_dim, to make your life easier if you have only one dimension. This parameter can take 20. (But really, I find it just confusing, I always work with input_shape and use tuples, to keep a consistent understanding).

Dense(32, input_shape=(784,)) is the same as Dense(32, input_dim=784).

Images

Images don't have only pixels, they also have channels (red, green, blue).
A black/white image has only one channel.

So, (28pixels, 28pixels, 1channel)

But notice that there isn't any obligation to follow this shape for images everywhere. You can shape them the way you like. But some kinds of layers do demand a certain shape, otherwise they couldn't work.

Some layers demand specific shapes

It's the case of the 2D convolutional layers, which need (size1,size2,channels). They need this shape because they must apply the convolutional filters accordingly.

It's also the case of recurrent layers, which need (timeSteps,featuresPerStep) to perform their recurrent calculations.

MNIST models

Again, there isn't any obligation to shape your image in a specific way. You must do it according to which first layer you choose and what you intend to achieve. It's a free thing.

Many examples simply don't care about an image being a 2d structured thing, and they just use models that take 784 pixels. That's enough. They probably start with Dense layers, which demand shapes like (size,)

Other examples may care, and use a shape (28,28), but then these models will have to reshape the input to fit the needs of the next layer.

Convolutional layers 2D will demand (28,28,1).

The main idea is: input arrays must match input_shape or input_dim.

Tensor shapes

Be careful, though, when reading Keras error messages or working with custom / lambda layers.

All these shapes we defined before omit an important dimension: the batch size or the number of samples.

Internally all tensors will have this additional dimension as the first dimension. Keras will report it as None (a dimension that will adapt to any batch size you have).

So, input_shape=(784,) will be reported as (None,784).
And input_shape=(28,28,1) will be reported as (None,28,28,1)

And your actual input data must have a shape that matches that reported shape.