python machine-learning tensorflow neural-network backpropagation

Addition of hidden layer makes output converge on one value -- Tensorflow

I started with a simple linear regression-style net, written in Tensorflow and largely based on their MNIST for beginners tutorial. There are 7 input variables and 1 output variable, all on a continuous scale. With this model, the outputs were all hovering around 1, which made sense because the target output set is largely dominated by values of 1. This is a sample of outputs generated by the test data:

[ 0.95340264]
[ 0.94097006]
[ 0.96644485]
[ 0.95954728]
[ 0.93524933]
[ 0.94564033]
[ 0.94379318]
[ 0.92746377]
[ 0.94073343]
[ 0.98421943]

However the accuracy never went about about 84% so I decided to add a hidden layer. Now the output has entirely converged on a single value, for example:

[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]

and the accuracy remains between 82-84%. When examining the obtained y value, target y value, and cross entropy from a single row of data over multiple training passes where the target output is 1, the obtained y value gradually approaches 1:

[ 0.]
[ 1.]
0.843537

[ 0.03999992]
[ 1.]
0.803543

[ 0.07999983]
[ 1.]
0.763534

[ 0.11999975]
[ 1.]
0.723541

[ 0.15999967]
[ 1.]
0.683544

and then hovers around 1 after it reaches the target:

[ 0.99136335]
[ 1.]
0.15912

[ 1.00366712]
[ 1.]
0.16013

[ 0.96366721]
[ 1.]
0.167638

[ 0.97597092]
[ 1.]
0.163856

[ 0.98827463]
[ 1.]
0.160069

However, when the target y value is 0.5, it behaves as though the target is 1, approaching 0.5 and then overshooting:

[ 0.47648361]
[ 0.5]
0.378556

[ 0.51296818]
[ 0.5]
0.350674

[ 0.53279752]
[ 0.5]
0.340844

[ 0.55262685]
[ 0.5]
0.331016

[ 0.57245618]
[ 0.5]
0.321187

while the cross entropy continues to diminish like it's actually reaching the target:

[ 0.94733644]
[ 0.5]
0.168714

[ 0.96027154]
[ 0.5]
0.164533

[ 0.97320664]
[ 0.5]
0.16035

[ 0.98614174]
[ 0.5]
0.156166

[ 0.99907684]
[ 0.5]
0.151983

Printing out the obtained values, target value, and distance to target for the test data shows the same obtained y regardless of target y:

5
[ 0.98564607]
[ 0.5]
[ 0.48564607]
6
[ 0.98564607]
[ 0.60000002]
[ 0.38564605]
7
[ 0.98564607]
[ 1.]
[ 0.01435393]
8
[ 0.98564607]
[ 1.]
[ 0.01435393]
9
[ 0.98564607]
[ 1.]
[ 0.01435393]

Code is below. a) Why, during the training portion, is the algorithm treating the target y value as if it is always 1 and b) why is it producing the same output during the test portion? Even if it "thinks" the target is always 1, there should be at least some variation in the test output, as seen in the training output.

import argparse
import dataset
import numpy as np
import os
import sys
import tensorflow as tf

FLAGS = None

def main(_):
    num_fields = 7
    batch_size = 100
    rating_field = 7
    outputs = 1
    hidden_units = 7

    train_data = dataset.Dataset("REPPED_RATING_TRAINING.txt", "    ", num_fields, rating_field)
    td_len = len(train_data.data)
    test_data = dataset.Dataset("REPPED_RATING_TEST.txt", " ", num_fields, rating_field)
    test_len = len(test_data.data)
    test_input = test_data.data[:, :num_fields].reshape(test_len, num_fields)
    test_target = test_data.fulldata[:, rating_field ].reshape(test_len, 1)

    graph = tf.Graph()
    with graph.as_default():
            x = tf.placeholder(tf.float32, [None, num_fields], name="x")
            W1 = tf.Variable(tf.zeros([num_fields, hidden_units]))
            b1 = tf.Variable(tf.zeros([hidden_units]))
            W2 = tf.Variable(tf.zeros([hidden_units, outputs]))
            b2 = tf.Variable(tf.zeros([outputs]))
            H = tf.add(tf.matmul(x, W1), b1, name="H")
            y = tf.add(tf.matmul(H, W2), b2, name="y")
            y_ = tf.placeholder(tf.float32, [None, outputs])
            yd = tf.abs(y_ - y)
            cross_entropy = tf.reduce_mean(yd)
            train_step = tf.train.GradientDescentOptimizer(0.04).minimize(cross_entropy)
            init = tf.global_variables_initializer()
            saver = tf.train.Saver()

    with tf.Session(graph=graph) as sess:
            sess.run(init)

            train_input, train_target = train_data.batch(td_len)
            for _ in range(FLAGS.times):
                    ts, yo, yt, ce = sess.run([train_step, y, y_, cross_entropy], feed_dict={x: train_input, y_:train_target})
                    #print obtained y, target y, and cross entropy from a given row over 10 training instances
                    print(yo[3])
                    print(yt[3])
                    print(ce)
                    print()

            checkpoint_file = os.path.join(FLAGS.model_dir, 'saved-checkpoint')
            print("\nWriting checkpoint file: " + checkpoint_file)
            saver.save(sess, checkpoint_file)

            test_input, test_target = test_data.batch(test_len)
            ty, ty_, tce, tyd = sess.run(
                    [y, y_, cross_entropy, yd],
                    feed_dict={x : test_input, y_: test_target})
            #print obtained y, target y, and distance to target for 10 random test rows
            for ix in range(10):
                    print(ix)
                    print(ty[ix])
                    print(ty_[ix])
                    print(tyd[ix])

            print()
            print('Ran times: ' + str(FLAGS.times))
            print('Acc: ' + str(1-tce))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--times', type=int, default=100,
                    help='Number of passes to train')
    parser.add_argument('--model_dir', type=str,
            default=os.path.join('.', 'tmp'),
            help='Directory for storing model info')
    FLAGS, unparsed = parser.parse_known_args()
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Solution

There are multiple problems in your code, and many of them could cause the network not to train properly:

You are initializing the weights and biases with a zero value. It should be initialized with a small random value (uniform or normally distributed).
There are no activation functions in your network, making it only able to model only linear relationships.
The learning rate is fixed, this is a hyper-parameter that you have to tune. You also have to monitor the value of the loss function during training, to make sure that it is decreasing and then converging to a small value. If its not then you should not look at the outputs, as the network is not learning anything.

Also, if you are not normalizing the inputs and outputs, you should also do so.