I'm trying to approximate a the log-function on a domain from one to one hundred with a neural network. I use tensorflow
as software. The results are not as good as I expected and I would like to understand why. I use the following code:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
## == data to be approximated == ##
x_grid = np.array([np.linspace(1, 100, 100)]).T
y_grid = np.log(x_grid)
def deepnn(x_val, prior):
"""
A neural network with input values x. Its parameters might be constraint according to a prior.
"""
## == input layer == ##
if prior:
w_in = tf.constant(1., shape=[1, 2]) #fixed to one
b_in = tf.constant([-1., -20.]) # fixed along kinks of the log function
else:
w_in = weight_variable([1, 2])
b_in = bias_variable([2])
f_in = tf.matmul(x_val, w_in) + b_in
## == first hidden layer == ##
g_1 = tf.nn.relu(f_in)
## == output layer == ##
w_out = weight_variable([2, 1])
b_out = bias_variable([1])
y_predict = tf.matmul(g_1, w_out) + b_out
return y_predict
def weight_variable(shape):
"""
generate a weight variable of a given shape
"""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
"""
generates a bias variable of a given shape
"""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
x_given = tf.placeholder(tf.float32, [None, 1])
y_out = deepnn(x_given, False)
y = tf.placeholder(tf.float32, [None, 1])
squared_deltas = tf.square(y_out - y)
loss = tf.reduce_sum(squared_deltas)
optimizer = tf.train.AdamOptimizer(1e-3)
train = optimizer.minimize(loss)
sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(50000):
sess.run(train, {x_given: x_grid, y: y_grid})
print(sess.run(loss, {x_given: x_grid, y: y_grid}))
sess.close()
The neural network deepnn(x_val, prior)
can be of two forms: If prior
is true, the parameters for the input layer function tf.matmul(x_val, w_in) + b_in
are set to w_in = 1
and b_in = [-1, -20]
. These values for b_in
will force the network to have a kink at x = 20
. If prior
is false, the parameter values are initialized to random variables for w
and b=0.1
. (The values, as well as the computer code, are inspired by a tensorflow guide.) The inputs are passed on to a hidden layer with rectifier activation functions and an output layer. Whether the network should adhere to the prior or not is defined in the line y_out = deepnn(x_given, False)
.
The neural network without prior restrictions produces (almost all the time) inferior results, compared to the network with prior. The network simply resembles a linear function. Curiously, the unrestricted network once produced a very good solution, which I couldn't replicate in subsequent trials though. The results are visualized in the Figure below.
Can somebody kindly explain why I cannot train the network well?
I haven't check thoroughly your code but it seems that you are not using any non-linear network. Your network is a shallow one (only 1 hidden layer) so to be deep (as you mention in the function) you need more layers. Also, I think you need more nodes in your layer. Try with 2 hidden layers at least.
BTW there is a function that does exactly what is says: tf.nn.xw_plus_b