I am trying to learn Keras and have created a simple network. The feature data is is [1, 2, 3, 4, 5], and the labels are [7, 9, 11, 13, 15] - or a line with a slope of 2 and an intercept of 5 (Y = X * 2 + 5).
Here is the Keras network:
# simple keras example
#
# This solves for a line
import numpy as np
import keras
# configuration variables
samples = 5
base = 1
slope = 2
intercept = 5
# hyper-parameters
learning_rate = 0.01
epochs = 2000
model = keras.Sequential()
model.add(keras.layers.Dense(1, input_dim=1, activation=keras.activations.linear))
sgd = keras.optimizers.SGD(learning_rate=learning_rate)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['mean_absolute_error'])
X = np.array(range(base, base+samples))
Y = X * slope + intercept
model.fit(X, Y, epochs=epochs, batch_size=samples)
loss, accuracy = model.evaluate(X, Y)
print('Loss: ', loss, ' Accuracy: ', accuracy)
k_slope = model.layers[0].get_weights()[0]
k_intercept = model.layers[0].get_weights()[1]
print('slope: ', k_slope, ' intercept: ', k_intercept)
The slope ends up as -0.1879 at the first epoch and does not progress. I suspect I am missing a parameter or setting, or perhaps a function call on the model. But I can't figure out what it is.
Here is a tensorflow network I am trying to reproduce in Keras. This network converges to the correct answer at about 1300 epochs:
#simple linear regression with tensorflow
#
# This solves for a line
#
import tensorflow as tf
import numpy as np
# configuration variables
samples = 5
base = 1
slope = 2
intercept = 5
# hyper-parameters
learning_rate = 0.01
epochs = 2000
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
m = tf.Variable(0.0)
b = tf.Variable(0.0)
pred = tf.add(tf.multiply(x, m), b)
cost = tf.reduce_mean(tf.abs(y - pred))
me_first = tf.global_variables_initializer()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
session = tf.Session()
session.run(me_first)
for i in range(epochs):
X = np.array(range(base, base+samples))
Y = X * slope + intercept
t_slope, t_intercept, total_err, opt = session.run([m, b, cost, optimizer], feed_dict={x:X, y:Y})
print('iter: ', i, ' intercept: ', t_intercept, ' slope: ', t_slope, ' error: ', total_err)
Ollin answered the question. The loss function was inappropriate for the network. "BinaryCrossentropy" should be used when the labels for the data will be 1 or 0. In my case the labels are any number. To make an equivalent network to my tensorflow example, the loss function needs to be "mean_absolute_error", or briefly "mae".
I did have the "metrics" field set in my call to model.compile() to "mean_absolute_error", and I incorrectly assumed that the metric would be used as a loss metric for the network. In fact, "metrics" are computed and reported, but not used by the algorithm at all. Metrics are there for the developer to see what other loss function values would be for the training data set.
It is unfortunate that Keras silently fails in this case. It would be useful if when it saw labels other than 0 or 1 to report that "BinaryCrossentropy" should not be used as a loss function.