I am deploying my conv-deconv net. My question is the cross entropy was always nan while training so the solver didn't update the weights. I checked my code all day but I didn't know where did I go wrong. The following is my architecture:
here is my cross entropy function
ys_reshape = tf.reshape(ys,[-1,1])
prediction = tf.reshape(relu4,[-1,1])
cross_entropy = tf.reduce_mean(-(ys_reshape*tf.log(prediction)))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)
where the dimension of ys is [1,500,500,1], ys_reshape is [250000,1], relu4 is [1,500,500,1] and prediction is [250000,1]. The value of label matrix, ys are {0,1}, which is a two categories dense prediction.
If I print train_step out it would display None. Can anyone help me?
You did a great job of narrowing the problem down to the right couple of lines of code.
So your predicted probability is directly the output of ReLU4
?
There are two problems with that.
First: it can be greater than one.
Second:
It can be exactly zero (Anywhere the input to ReLU4
is negative, it's output will be zero).
log(0) -> NaN
The usual approach to this is to treat the linear activations (No ReLU) as the log-odds of each class.
A naive implementation is always broken (numerical issues).
Since you have a single class, you should use tf.sigmoid_cross_entropy_with_logits
And for the training op returning None
: There is a subtle distinction here, between ops and tensors. Try print(train_step)
and print(cross_entropy)
.
Evaluating an op does something, while evaluating a tensor gets you a value. So if you're looking for the value of the cross entropy that was calculated on the forward pass, just do something like _, loss_value = sess.run([train_step, cross_entropy])