Does tensorflow compute the cross entropy only with single precision?

I am trying to fully understand the computation of the cross entropy in TensorFlow. In the following piece of code, with numpy I generate double precision random double data x, transform it to logits for binary classification (i.e., only one logit per data point), map it through sigmoid in sig, compute the cross entropy ce and then the mean cross entropy mce. The analogous computations in TensorFlow are further below. My question is:

Why do I get a discrepancy between my mean cross entropy mce (computed with double precision in numpy) and the TensorFlow tf.losses.sigmoid_cross_entropy?

I dont know, where I forgot to specify for TensorFlow to compute with double precision. Furthermore, if I use tf.nn.reduce_mean, see computation of mcetf2, on the computed cross entropy per data point, then I get my numpy result. Where does the discrepancy come from? Thank you!

import numpy as np
import tensorflow as tf

#%%

# Number of data pionts nx and dimension dx
nx = 10
dx = 4

# Input data
x = np.random.rand(nx,dx)

#%% Numpy

# Transform to logits for binary classification with sigmoid
matrix = np.random.rand(dx,1)
logits = np.matmul(x,matrix)
print('Logits dimensions: %s' % str(logits.shape))

# Sigmoid
def sigmoid(x):
    return 1. / (1. + np.exp(-x))
sig = sigmoid(logits)
print('Sigmoid dimensions: %s' % str(sig.shape))

# Discrete probabilities
p = np.random.randint(2,size=nx)[:,np.newaxis]
print('Probability dimensions: %s'% str(p.shape))

# Cross entropy for each data point
ce = p*np.log(1/sig)+(1-p)*np.log(1/(1-sig))

# Mean cross entropy
mce = np.mean(ce)
print('MCE with np: %.16f' % mce)

#%% Tensorflow

xp = tf.placeholder(dtype=tf.float64,shape=[None,dx])
pp = tf.placeholder(dtype=tf.float64,shape=[None,1])

model = xp
c1 = tf.constant(matrix,dtype=tf.float64)
model = tf.matmul(xp,c1)
sigtf = tf.nn.sigmoid(model)
cetf = tf.nn.sigmoid_cross_entropy_with_logits(labels=pp,logits=model)
mcetf = tf.losses.sigmoid_cross_entropy(pp,model)
mcetf2 = tf.reduce_mean(cetf)

sess = tf.Session()
feed = {xp:x,pp:p}
print('Error in logits: %.16f' % np.max(np.abs(sess.run(model,feed)-logits)))
print('Error in sigmoid: %.16f' % np.max(np.abs(sess.run(sigtf,feed)-sig)))
print('Error in CE: %.16f' % np.max(np.abs(sess.run(cetf,feed)-ce)))
print('Error in MCE: %.16f' % np.abs(sess.run(mcetf,feed)-mce))
print('Error in MCE2: %.16f' % np.abs(sess.run(mcetf2,feed)-mce))
sess.close()

Logits dimensions: (10, 1)

Sigmoid dimensions: (10, 1)

Probability dimensions: (10, 1)

MCE with np: 0.7413128316195762

Error in logits: 0.0000000000000000

Error in sigmoid: 0.0000000000000000

Error in CE: 0.0000000000000009

Error in MCE: 0.0000000297816550

Error in MCE2: 0.0000000000000001

Solution

the use of (32bit) floats would appear to be hard coded in the compute_weighted_loss() function used by sigmoid_cross_entropy in Tensorflow

as a minor point your numpy code for calculating ce isn't very numerically stable — but it won't be affecting anything here. I'd implement it as:

ce = p * -np.log(sig) + (1-p) * -np.log1p(-sig)

the use of log1p is the main change. your use of 1 - sig will lose all precision as sig approaches zero