tensorflow machine-learning deep-learning neural-network

Confusion with tensorflow's Sequential Dense

I'm working on a regression probem using Tensorflow, and have created two models with slight differences in their first Dense layer.

The Models

# Create some regression data
X_regression = tf.range(0, 1000, 5)
y_regression = tf.range(100, 1100, 5) # Y = X+ 100

# Split regression data into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]

Model 1

# Setup random seed
tf.random.set_seed(42)


model_1_reg = tf.keras.Sequential([
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])


model_1_reg.compile(loss=tf.keras.losses.mae,
                    optimizer=tf.keras.optimizers.Adam(),
                    metrics=['mae'])

model_1_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100)

Model 2

# Setup random seed
tf.random.set_seed(42)


model_2_reg = tf.keras.Sequential([
    tf.keras.layers.Dense(100, input_shape=(None, 1)),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])


model_2_reg.compile(loss=tf.keras.losses.mae,
                    optimizer=tf.keras.optimizers.Adam(),
                    metrics=['mae'])

model_2_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100)

I'm confused about whether I should add the input_shape or not. Model 1's input shape becomes (None, 1) and Model 2's input becomes (None, None, 1).

Both of them run, but perform differently.

Model 2 makes sense since we're inputting an array, but if I think about it, does that mean I only have a single node in the input layer? Since I'm giving it a whole ndarray instead of the instances it self. Model 1 makes sense too since I want to give each number into it.

So, which one makes sense more? Or what case should I use each model for? Also, for model 2's fit why does doing

tf.expand_dims(X_reg_train, axis=-1)

for the X of

model_2_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100

work? I thought we're suppose to put it in as a batch or like an array of the data so it should be inside an ndarray?

Solution

When you give the input shape with the input_shape parameter, you exclude the batch dimension. That's why you get (None, None, 1) for Model 2, because TF inserts the first None batch dimension in addition to the shape you provide. I'm actually a bit surprised that Model 2 runs with the additional None dimension. If you provide no input_shape, TensorFlow will try to pick it from the x parameter of model.fit (so your tf.expand_dims(X_reg_train, axis=-1)).

As for your second question

why does tf.expand_dims(X_reg_train, axis=-1) work?

It is actually required. For a dense network, the input shape is expected to be (samples, features), even with only one feature. The tf.expand_dims provides that shape, because you go from shape (200,) to (200, 1). You could do the same with np.expand_dims, as TF accepts numpy arrays as input. Under the hood TF would convert it to Tensors though, so it makes no (real) difference if you provide numpy arrays or TensorFlow tensors.