Hello
I just want to try binary classification with simple logistic regression.I've got unlabeled output data as {1,0} // (He/she passed exam or not) cost function returns (NaN).What is wrong?
learning_rate = 0.05
total_iterator = 1500
display_per = 100
data = numpy.loadtxt("ex2data1.txt",dtype=numpy.float32,delimiter=",");
training_X = numpy.asarray(data[:,[0,1]]) # 100 x 2
training_X contains 100 x 2 matrix as the exam scores.e.g [98.771 4.817]
training_Y = numpy.asarray(data[:,[2]],dtype=numpy.int) # 100 x 1
training_Y contains 100x1 array as, [1] [0] [0] [1] i can't write line by line due to stackoverflow format
m = data.shape[0]
x_i = tf.placeholder(tf.float32,[None,2]) # None x 2
y_i = tf.placeholder(tf.float32,[None,1]) # None x 1
W = tf.Variable(tf.zeros([2,1])) # 2 x 1
b = tf.Variable(tf.zeros([1])) # 1 x 1
h = tf.nn.softmax(tf.matmul(x_i,W)+b)
cost = tf.reduce_sum(tf.add(tf.multiply(y_i,tf.log(h)),tf.multiply(1-
y_i,tf.log(1-h)))) / -m
i tried to use simple logistic cost function.it got returned 'NaN'.i thought my cost function is totally garbarage,got used tensorflow's example's cost function:
cost = tf.reduce_mean(-tf.reduce_sum(y_i*tf.log(h), reduction_indices=1))
but it didn't worked as well.
initializer= tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("cost: ", sess.run(cost, feed_dict={x_i:training_X,
y_i:training_Y}), "w: ", sess.run(W),"b: ", sess.run(b))
The function tf.nn.softmax
expects the number of logits (last dimension) to be equal the number of classes (2 in your case {1,0}). since the last dimension in your case is 1, softmax will always return 1 (the probability of being in the only available class is always 1 since no other class exists). therefore h
is a tensor filled with 1's and tf.log(1-h)
will return negative infinity. Infinity multiplied by zero (1-y_i
in some rows) returns NaN.
You should replace tf.nn.softmax
with tf.nn.sigmoid
.
A possible fix is:
h = tf.nn.sigmoid(tf.matmul(x_i,W)+b)
cost = tf.reduce_sum(tf.add(tf.multiply(y_i,tf.log(h)),tf.multiply(1-
y_i,tf.log(1-h)))) / -m
or better, you can use tf.sigmoid_cross_entropy_with_logits
in that case, it should be done as follows:
h = tf.matmul(x_i,W)+b
cost = tf.reduce_mean(tf.sigmoid_cross_entropy_with_logits(labels=y_i, logits=h))
this function is more numerically stable than using tf.nn.sigmoid
followed by the cross_entropy function which can return a NaN if tf.nn.sigmoid
gets near 0 or 1 due to the imprecision of float32
.