Search code examples
tensorflowkerasmathematical-optimization

Getting bad performance with a simple TensorFlow model


I'm trying to experiment with a simple TensorFlow model built with keras, but I can't figure out why I'm getting such poor predictions. Here's the model:

x_train = np.asarray([[.5], [1.0], [.4], [5], [25]])
y_train = np.asarray([.25, .5, .2, 2.5, 12.5])

opt = keras.optimizers.Adam(lr=0.01)

model = Sequential()
model.add(Dense(1, activation="relu", input_shape=(x_train.shape[1:])))
model.add(Dense(9, activation="relu"))
model.add(Dense(1, activation="relu"))

model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
model.fit(x_train, y_train, shuffle=True, epochs=10)

print(model.predict(np.asarray([[5]])))

As you can see, it should learn to divide the input by two. However the loss is 32.5705, and over a few epochs, it refuses to change whatsoever (even if I do something crazy like 100 epochs, it's always that loss). Is there anything you can see that I'm doing horribly wrong here? The prediction for any value it seems is 0..

It also seems to be randomly switching between performing as expected, and the weird behavior described above. I re-ran it and got a loss of 0.0019 after 200 epochs, but if I re-run it with all the same parameters a second later the loss stays at 30 like before. What's going on here?


Solution

  • Some reasons that I can think of,

    1. training set is too small
    2. learning rate is high
    3. last layer should just be a linear layer
    4. for some runs the ReLU units are dying (see dead ReLU problem) and your network weights don't change after that so you see the same loss value.
    5. In this case maybe a tanh activation will provide better conditioning for optimization

    I made a few changes to your code based on what I commented, and I get decent results.

    import keras
    import numpy as np
    from keras.models import Sequential
    from keras.layers import Dense, Activation
    
    x_train = np.random.random((50000, 1))#np.asarray([[.5], [1.0], [.4], [5], [25]])
    
    y_train = x_train /2. #TODO: add small amount of noise to y #np.asarray([.25, .5, .2, 2.5, 12.5])
    
    opt = keras.optimizers.Adam(lr=0.0005, clipvalue=0.5)
    
    model = Sequential()
    model.add(Dense(1, activation="tanh", input_shape=x_train.shape[1:]))
    model.add(Dense(9, activation="tanh"))
    model.add(Dense(1, activation=None))
    
    model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mean_squared_error'])
    model.fit(x_train, y_train, shuffle=True, epochs=10)
    
    print(model.predict(np.asarray([.4322])))
    

    Output:

    [[0.21410337]]