I'm working on a regression probem using Tensorflow, and have created two models with slight differences in their first Dense layer.
# Create some regression data
X_regression = tf.range(0, 1000, 5)
y_regression = tf.range(100, 1100, 5) # Y = X+ 100
# Split regression data into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]
Model 1
# Setup random seed
tf.random.set_seed(42)
model_1_reg = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
])
model_1_reg.compile(loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.Adam(),
metrics=['mae'])
model_1_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100)
Model 2
# Setup random seed
tf.random.set_seed(42)
model_2_reg = tf.keras.Sequential([
tf.keras.layers.Dense(100, input_shape=(None, 1)),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
])
model_2_reg.compile(loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.Adam(),
metrics=['mae'])
model_2_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100)
I'm confused about whether I should add the input_shape
or not. Model 1's input shape becomes (None, 1)
and Model 2's input becomes (None, None, 1)
.
Both of them run, but perform differently.
Model 2 makes sense since we're inputting an array, but if I think about it, does that mean I only have a single node in the input layer? Since I'm giving it a whole ndarray instead of the instances it self. Model 1 makes sense too since I want to give each number into it.
So, which one makes sense more? Or what case should I use each model for? Also, for model 2's fit why does doing
tf.expand_dims(X_reg_train, axis=-1)
for the X of
model_2_reg.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs=100
work? I thought we're suppose to put it in as a batch or like an array of the data so it should be inside an ndarray
?
When you give the input shape with the input_shape
parameter, you exclude the batch dimension. That's why you get (None, None, 1)
for Model 2
, because TF inserts the first None
batch dimension in addition to the shape you provide. I'm actually a bit surprised that Model 2
runs with the additional None
dimension. If you provide no input_shape
, TensorFlow will try to pick it from the x
parameter of model.fit
(so your tf.expand_dims(X_reg_train, axis=-1)
).
As for your second question
why does
tf.expand_dims(X_reg_train, axis=-1)
work?
It is actually required. For a dense network, the input shape is expected to be (samples, features)
, even with only one feature. The tf.expand_dims
provides that shape, because you go from shape (200,)
to (200, 1)
. You could do the same with np.expand_dims
, as TF accepts numpy arrays as input. Under the hood TF would convert it to Tensors
though, so it makes no (real) difference if you provide numpy arrays or TensorFlow tensors.