tensorflow keras mathematical-optimization

Getting bad performance with a simple TensorFlow model

I'm trying to experiment with a simple TensorFlow model built with keras, but I can't figure out why I'm getting such poor predictions. Here's the model:

x_train = np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = np.asarray([.25, .5, .2, 2.5, 12.5])

opt = keras.optimizers.Adam(lr=0.01)

model = Sequential()
model.add(Dense(1, activation="relu", input_shape=(x_train.shape[1:])))
model.add(Dense(9, activation="relu"))
model.add(Dense(1, activation="relu"))

model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)

print(model.predict(np.asarray([[5]])))

As you can see, it should learn to divide the input by two. However the loss is 32.5705, and over a few epochs, it refuses to change whatsoever (even if I do something crazy like 100 epochs, it's always that loss). Is there anything you can see that I'm doing horribly wrong here? The prediction for any value it seems is 0..

It also seems to be randomly switching between performing as expected, and the weird behavior described above. I re-ran it and got a loss of 0.0019 after 200 epochs, but if I re-run it with all the same parameters a second later the loss stays at 30 like before. What's going on here?

Solution

Some reasons that I can think of,

training set is too small
learning rate is high
last layer should just be a linear layer
for some runs the ReLU units are dying (see dead ReLU problem) and your network weights don't change after that so you see the same loss value.
In this case maybe a tanh activation will provide better conditioning for optimization

I made a few changes to your code based on what I commented, and I get decent results.

import keras
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation

x_train = np.random.random((50000, 1))#np.asarray([[.5], [1.0], [.4], [5], [25]])

y_train = x_train /2. #TODO: add small amount of noise to y #np.asarray([.25, .5, .2, 2.5, 12.5])

opt = keras.optimizers.Adam(lr=0.0005, clipvalue=0.5)

model = Sequential()
model.add(Dense(1, activation="tanh", input_shape=x_train.shape[1:]))
model.add(Dense(9, activation="tanh"))
model.add(Dense(1, activation=None))

model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)

print(model.predict(np.asarray([.4322])))

Output:

[[0.21410337]]