I started with a simple linear regression-style net, written in Tensorflow and largely based on their MNIST for beginners tutorial. There are 7 input variables and 1 output variable, all on a continuous scale. With this model, the outputs were all hovering around 1, which made sense because the target output set is largely dominated by values of 1. This is a sample of outputs generated by the test data:
[ 0.95340264]
[ 0.94097006]
[ 0.96644485]
[ 0.95954728]
[ 0.93524933]
[ 0.94564033]
[ 0.94379318]
[ 0.92746377]
[ 0.94073343]
[ 0.98421943]
However the accuracy never went about about 84% so I decided to add a hidden layer. Now the output has entirely converged on a single value, for example:
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
and the accuracy remains between 82-84%. When examining the obtained y value, target y value, and cross entropy from a single row of data over multiple training passes where the target output is 1, the obtained y value gradually approaches 1:
[ 0.]
[ 1.]
0.843537
[ 0.03999992]
[ 1.]
0.803543
[ 0.07999983]
[ 1.]
0.763534
[ 0.11999975]
[ 1.]
0.723541
[ 0.15999967]
[ 1.]
0.683544
and then hovers around 1 after it reaches the target:
[ 0.99136335]
[ 1.]
0.15912
[ 1.00366712]
[ 1.]
0.16013
[ 0.96366721]
[ 1.]
0.167638
[ 0.97597092]
[ 1.]
0.163856
[ 0.98827463]
[ 1.]
0.160069
However, when the target y value is 0.5, it behaves as though the target is 1, approaching 0.5 and then overshooting:
[ 0.47648361]
[ 0.5]
0.378556
[ 0.51296818]
[ 0.5]
0.350674
[ 0.53279752]
[ 0.5]
0.340844
[ 0.55262685]
[ 0.5]
0.331016
[ 0.57245618]
[ 0.5]
0.321187
while the cross entropy continues to diminish like it's actually reaching the target:
[ 0.94733644]
[ 0.5]
0.168714
[ 0.96027154]
[ 0.5]
0.164533
[ 0.97320664]
[ 0.5]
0.16035
[ 0.98614174]
[ 0.5]
0.156166
[ 0.99907684]
[ 0.5]
0.151983
Printing out the obtained values, target value, and distance to target for the test data shows the same obtained y regardless of target y:
5
[ 0.98564607]
[ 0.5]
[ 0.48564607]
6
[ 0.98564607]
[ 0.60000002]
[ 0.38564605]
7
[ 0.98564607]
[ 1.]
[ 0.01435393]
8
[ 0.98564607]
[ 1.]
[ 0.01435393]
9
[ 0.98564607]
[ 1.]
[ 0.01435393]
Code is below. a) Why, during the training portion, is the algorithm treating the target y value as if it is always 1 and b) why is it producing the same output during the test portion? Even if it "thinks" the target is always 1, there should be at least some variation in the test output, as seen in the training output.
import argparse
import dataset
import numpy as np
import os
import sys
import tensorflow as tf
FLAGS = None
def main(_):
num_fields = 7
batch_size = 100
rating_field = 7
outputs = 1
hidden_units = 7
train_data = dataset.Dataset("REPPED_RATING_TRAINING.txt", " ", num_fields, rating_field)
td_len = len(train_data.data)
test_data = dataset.Dataset("REPPED_RATING_TEST.txt", " ", num_fields, rating_field)
test_len = len(test_data.data)
test_input = test_data.data[:, :num_fields].reshape(test_len, num_fields)
test_target = test_data.fulldata[:, rating_field ].reshape(test_len, 1)
graph = tf.Graph()
with graph.as_default():
x = tf.placeholder(tf.float32, [None, num_fields], name="x")
W1 = tf.Variable(tf.zeros([num_fields, hidden_units]))
b1 = tf.Variable(tf.zeros([hidden_units]))
W2 = tf.Variable(tf.zeros([hidden_units, outputs]))
b2 = tf.Variable(tf.zeros([outputs]))
H = tf.add(tf.matmul(x, W1), b1, name="H")
y = tf.add(tf.matmul(H, W2), b2, name="y")
y_ = tf.placeholder(tf.float32, [None, outputs])
yd = tf.abs(y_ - y)
cross_entropy = tf.reduce_mean(yd)
train_step = tf.train.GradientDescentOptimizer(0.04).minimize(cross_entropy)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session(graph=graph) as sess:
sess.run(init)
train_input, train_target = train_data.batch(td_len)
for _ in range(FLAGS.times):
ts, yo, yt, ce = sess.run([train_step, y, y_, cross_entropy], feed_dict={x: train_input, y_:train_target})
#print obtained y, target y, and cross entropy from a given row over 10 training instances
print(yo[3])
print(yt[3])
print(ce)
print()
checkpoint_file = os.path.join(FLAGS.model_dir, 'saved-checkpoint')
print("\nWriting checkpoint file: " + checkpoint_file)
saver.save(sess, checkpoint_file)
test_input, test_target = test_data.batch(test_len)
ty, ty_, tce, tyd = sess.run(
[y, y_, cross_entropy, yd],
feed_dict={x : test_input, y_: test_target})
#print obtained y, target y, and distance to target for 10 random test rows
for ix in range(10):
print(ix)
print(ty[ix])
print(ty_[ix])
print(tyd[ix])
print()
print('Ran times: ' + str(FLAGS.times))
print('Acc: ' + str(1-tce))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--times', type=int, default=100,
help='Number of passes to train')
parser.add_argument('--model_dir', type=str,
default=os.path.join('.', 'tmp'),
help='Directory for storing model info')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
There are multiple problems in your code, and many of them could cause the network not to train properly:
Also, if you are not normalizing the inputs and outputs, you should also do so.