Search code examples
pythonmachine-learningneural-networktensorflowlogistic-regression

Binary classification in TensorFlow, unexpected large values for loss and accuracy


I am trying to use a deep neural network architecture to classify against a binary label value - -1 and +1. Here is my code to do it in tensorflow.

import tensorflow as tf
import numpy as np
from preprocess import create_feature_sets_and_labels

train_x,train_y,test_x,test_y = create_feature_sets_and_labels()

x = tf.placeholder('float', [None, 5])
y = tf.placeholder('float')

n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500

n_classes = 1
batch_size = 100

def neural_network_model(data):

    hidden_1_layer = {'weights':tf.Variable(tf.random_normal([5, n_nodes_hl1])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}

    hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}

    hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}

    output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
                      'biases':tf.Variable(tf.random_normal([n_classes]))}


    l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.relu(l1)

    l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    l3 = tf.add(tf.matmul(l2, hidden_3_layer['weights']), hidden_3_layer['biases'])
    l3 = tf.nn.relu(l3)

    output = tf.transpose(tf.add(tf.matmul(l3, output_layer['weights']), output_layer['biases']))
    return output



def train_neural_network(x):
    prediction = neural_network_model(x)
    cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(prediction, y))
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    hm_epochs = 10

    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            i = 0
            while i < len(train_x):
                start = i
                end = i + batch_size
                batch_x = np.array(train_x[start:end])
                batch_y = np.array(train_y[start:end])

                _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                              y: batch_y})
                epoch_loss += c
                i+=batch_size

            print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)

        # correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
        # accuracy = tf.reduce_mean(tf.cast(correct, 'float'))

        print (test_x.shape)
        accuracy = tf.nn.l2_loss(prediction-y,name="squared_error_test_cost")/test_x.shape[0]
        print('Accuracy:', accuracy.eval({x: test_x, y: test_y}))

train_neural_network(x)

This is the output I get when I run this:

('Epoch', 0, 'completed out of', 10, 'loss:', -8400.2424869537354)
('Epoch', 1, 'completed out of', 10, 'loss:', -78980.956665039062)
('Epoch', 2, 'completed out of', 10, 'loss:', -152401.86713409424)
('Epoch', 3, 'completed out of', 10, 'loss:', -184913.46441650391)
('Epoch', 4, 'completed out of', 10, 'loss:', -165563.44775390625)
('Epoch', 5, 'completed out of', 10, 'loss:', -360394.44857788086)
('Epoch', 6, 'completed out of', 10, 'loss:', -475697.51550292969)
('Epoch', 7, 'completed out of', 10, 'loss:', -588638.92993164062)
('Epoch', 8, 'completed out of', 10, 'loss:', -745006.15966796875)
('Epoch', 9, 'completed out of', 10, 'loss:', -900172.41955566406)
(805, 5)
('Accuracy:', 5.8077128e+09)

I don't understand if the values I am getting are correct as there is a real dearth of non-MNIST binary classification examples. The accuracy is nothing like what I expected. I was expecting a percentage instead of that large value.

I am also somewhat unsure of the theory behind machine learning which is why I can't tell the correctness of my approach using tensorflow.

Can someone please tell me if my approach towards binary classification is correct? Also is the accuracy part of my code correct?


Solution

  • From this:

    a binary label value - -1 and +1

    . . . I am assuming your values in train_y and test_y are actually -1.0 and +1.0

    This is not going to work very well with your chosen loss function sigmoid_cross_entropy_with_logits - which assumes 0.0 and +1.0. The negative y values are causing mayhem! However, the loss function choice is good for binary classification. I suggest change your y values to 0 and 1.

    In addition, technically the output of your network is not the final prediction. The loss function sigmoid_cross_entropy_with_logits is designed to work with a network with sigmoid transfer function in the output layer, although you have got it right that the loss function is applied before this is done. So your training code appears correct

    I'm not 100% sure about the tf.transpose though - I would see what happens if you remove that, personally I.e.

    output = tf.add(tf.matmul(l3, output_layer['weights']), output_layer['biases'])
    

    Either way, this is the "logit" output, but not your prediction. The value of output can get high for very confident predictions, which probably explains your very high values later due to missing the sigmoid function. So add a prediction tensor (this represents the probability/confidence that the example is in the positive class):

    prediction = tf.sigmoid(output)
    

    You can use that to calculate accuracy. Your accuracy calculation should not be based on L2 error, but sum of correct values - closer to the code you had commented out (which appears to be from a multiclass classification). For a comparison with true/false for binary classification, you need to threshold the predictions, and compare with the true labels. Something like this:

     predicted_class = tf.greater(prediction,0.5)
     correct = tf.equal(predicted_class, tf.equal(y,1.0))
     accuracy = tf.reduce_mean( tf.cast(correct, 'float') )
    

    The accuracy value should be between 0.0 and 1.0. If you want as a percentage, just multiply by 100 of course.