I am aiming for a sequential neural network with two neurons capabale of reproducing a quadratic function. To do this, I chose the activation function of the first neuron to be lambda x: x**2
, and the second neuron to be None
.
Each neuron outputs A(ax+b)
where A
is the activation function, a
is the weight for the given neuron, b
is the bias term. Output of the first neuron is passed onto the second neuron, and the output of that neuron is the result.
The form of the output of my network is then:
Training the model means to adjust the weights and biases of each neuron. Choosing a very simple set of parameters, ie:
leads us to a parabola which should be perfectly learnable by a 2-neuron neural net descibed above:
To implement the neural network, I do:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
Define function to be learned:
f = lambda x: x**2 + 2*x + 2
Generate training inputs and outputs using above function:
np.random.seed(42)
questions = np.random.rand(999)
solutions = f(questions)
Define neural network architecture:
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1],activation=lambda x: x**2),
tf.keras.layers.Dense(units=1, input_shape=[1],activation=None)
])
Compile net:
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.Adam(0.1))
Train the model:
history = model.fit(questions, solutions, epochs=999, batch_size = 1, verbose=1)
Generate predictions of f(x)
using the newly trained model:
np.random.seed(43)
test_questions = np.random.rand(100)
test_solutions = f(test_questions)
test_answers = model.predict(test_questions)
Visualize result:
plt.figure(figsize=(10,6))
plt.scatter(test_questions, test_solutions, c='r', label='solutions')
plt.scatter(test_questions, test_answers, c='b', label='answers')
plt.legend()
The red dots form the curve of the parabola our model was supposed to learn, the blue dots form the curve which it has learnt. This approach clearly did not work.
What is wrong with the approach above and how to make the neural net actually learn the parabola?
To add to @Zabob's answer. You have used Adam optimizer which is sensitive to the initial learning rate, and while it is considered quite robust, I have found that it is sensitive to the initial learning rate- and can result in unexpected results (as in your case where it is learning opposite curve). If you change the optimizer to SGD:
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.SGD(0.01))
Then in less than 100 epochs, you can get an optimized network: