Why does Keras convert the input shape from (3,3) to (?,3,3)?

I am currently trying to get custom keras layers to work, you can see a simplified version here:

class MyLayer(Layer):

    def __init__(self, **kwargs):
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        print("input_shape: "+str(input_shape))
        self.kernel = self.add_weight(name='kernel',
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        super(MyLayer, self).build(input_shape)

    def call(self, x):
        print("input tensor: "+str(x))
        return K.dot(x, self.kernel)


inputs = Input(shape=(3,3), dtype='float', name='inputs')
results = MyLayer(input_shape=(3,3))(inputs)

The resulting console output is this:

input_shape: (None, 3, 3)
input tensor: Tensor("inputs:0", shape=(?, 3, 3), dtype=float32)

As you can see, the input_shape that the layer gets is not (3,3) as I specified but actually (None,3,3). Why is that? The shape of the input tensor is also shaped ( ?, 3,3) which I thought to be a consequence of the weird input_shape ( None, 3,3). But the input tensor also has this shape with a third dimension if you replace super(MyLayer, self).build(input_shape) with super(MyLayer, self).build((3,3)). What is this mysterious third dimension keras automatically adds and why does it do that?

Solution

It is nothing mysterious, it is the batch dimension, since keras (and most DL frameworks), make computations on batches of data at a time, since this increases parallelism, and it maps directly to batches in Stochastic Gradient Descent.

Your layer needs to support computation on batches, so the batch dimension is always present in input and output data, and it is automatically added by keras to the input_shape.