Search code examples
pythonnumpytensorflowkerasneural-network

Inconsistency in input shape to neural network


I coded two simple neural networks to add two numbers, and to square a number. I used them to create a program to multiply two numbers.

import tensorflow as tf
import numpy as np

model_add = tf.keras.models.load_model('model_add.keras')
model_sqr = tf.keras.models.load_model('model_sqr.keras')

predicted_product = model_sqr.predict(model_add.predict(np.array([3, 4]))) - model_sqr.predict(np.array([3])) -  model_sqr.predict(np.array([4]))
print(predicted_product/2)

I had specified an input shape of (1,) for the square model and trained it on this data:

x_train = np.random.random((200000,1))*100-50

For the addition model, I trained it on

X_train = np.random.rand(num_train, 2)

I assume that since each element of the training array is what is inputted to the model each time during training, similarly, it is that same shape that is to be used for inputting testing data to the model. This is as it is for the square model, where I trained it on inputs of shape (200000,1) and used numpy arrays of shape (1,) as inputs.

But when I run this program, the following error shows up;

Invalid input shape for input Tensor("sequential_1/Cast:0", shape=(2,), dtype=float32). 
Expected shape (None, 2), but input has incompatible shape (2,)

It looks like I should have used np.array([[3, 4]]) instead of np.array([3, 4]). But since each element of the training data for the addition model is of the shape (2,), shouldn't that be what I use?

EDIT: The add model I'm using is

num_train = 1000

X_train = np.random.rand(num_train, 2)
y_train_add = X_train[:, 0] + X_train[:, 1]

model_add = Sequential(
        [
            Dense(10),
            Dense(1)
            ]
        )
batch_size = 32
epochs = 100

model_add.compile(loss = 'mse', optimizer='adam')
model_add.fit(X_train, y_train_add, batch_size=batch_size, epochs=epochs, verbose = 1)

And the square model I'm using is coded as:

x_train = np.random.random((200000,1))*100-50
y_train = np.square(x_train)

model_sqr = Sequential(
        [
            Dense(8, activation = 'elu', kernel_regularizer = regularizers.l2(0.001), input_shape = (1,)),
            Dense(8, activation = 'elu', kernel_regularizer = regularizers.l2(0.001)),
            Dense(1)
            ]

        )

batch_size = 32
epochs = 100

model_sqr.compile(loss = 'mse', optimizer='adam')
model_sqr.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose = 1)

Solution

  • Normally, the training in TensorFlow is done in batches of your training data (currently, the batch size defaults to 32). Your training data will be split using the batch size and iteratively propagated through the model. So it is indeed using your training samples each as input during training.

    The same goes for inference of the model. You can pass multiple test samples to the model (including a batch size), which is why you get an error. The .predict() function expects a collection of samples (e.g. np.array). Since a sample of your addition model requires to be of shape (2,) the required shape to give to the .predict() function needs to be (None, 2). None indicates that you can have any number of samples.

    Note that the output of the .predict() function also returns the predictions for the number of provided samples. To examine your model I suggest using TensorFlow Graph. It can be displayed in the TensorBoard and also displays the shape of each layer. For a simplified view you can just use model.summary().