python tensorflow conv-neural-network cross-entropy

cross entropy is nan

I am deploying my conv-deconv net. My question is the cross entropy was always nan while training so the solver didn't update the weights. I checked my code all day but I didn't know where did I go wrong. The following is my architecture: here is my cross entropy function

ys_reshape = tf.reshape(ys,[-1,1])
prediction = tf.reshape(relu4,[-1,1])
cross_entropy = tf.reduce_mean(-(ys_reshape*tf.log(prediction)))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)

where the dimension of ys is [1,500,500,1], ys_reshape is [250000,1], relu4 is [1,500,500,1] and prediction is [250000,1]. The value of label matrix, ys are {0,1}, which is a two categories dense prediction.

If I print train_step out it would display None. Can anyone help me?

Solution

You did a great job of narrowing the problem down to the right couple of lines of code.

So your predicted probability is directly the output of ReLU4?

There are two problems with that.

First: it can be greater than one.

Second:

It can be exactly zero (Anywhere the input to ReLU4 is negative, it's output will be zero).

log(0) -> NaN

The usual approach to this is to treat the linear activations (No ReLU) as the log-odds of each class.

A naive implementation is always broken (numerical issues).

Since you have a single class, you should use tf.sigmoid_cross_entropy_with_logits

And for the training op returning None: There is a subtle distinction here, between ops and tensors. Try print(train_step) and print(cross_entropy).

Evaluating an op does something, while evaluating a tensor gets you a value. So if you're looking for the value of the cross entropy that was calculated on the forward pass, just do something like _, loss_value = sess.run([train_step, cross_entropy])