python python-3.x tensorflow scipy metrics

Why Pearson correlation is different between Tensorflow and Scipy

I compute the Pearson correlation in 2 ways :

In Tensorflow, I use the following metric :

tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)

When I evaluate my network on test data, I got following results :

loss = 0.5289223349094391

pearson = 0.3701728057861328

(Loss is mean_squared_error)

Then I predict the test data and compute the same metrics with Scipy :

import scipy.stats as measures
per_coef = measures.pearsonr(y_pred, y_true)[0]
mse_coef = np.mean(np.square(np.array(y_pred) - np.array(y_true)))

And I get following results :

Pearson = 0.5715300096509959

MSE = 0.5289223312665985

Is it a known issue ? Is it normal ?

Minimal, complete and verifiable example

import tensorflow as tf
import scipy.stats as measures

y_pred = [2, 2, 3, 4, 5, 5, 4, 2]
y_true = [1, 2, 3, 4, 5, 6, 7, 8]

## Scipy
val2 = measures.pearsonr(y_pred, y_true)[0]
print("Scipy's Pearson = {}".format(val2))

## Tensorflow
logits = tf.placeholder(tf.float32, [8])
labels = tf.to_float(tf.Variable(y_true))

acc, acc_op = tf.contrib.metrics.streaming_pearson_correlation(logits,labels)

sess = tf.Session()
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sess.run(acc, {logits:y_pred})
sess.run(acc_op, {logits:y_pred})

print("Tensorflow's Pearson:{}".format(sess.run(acc,{logits:y_pred})))

Solution

In the minimal verifiable example you gave, y_pred and y_true are lists of integers. In the first line of the scipy.stats.measures.pearsonr source, you will see that the inputs are converted to numpy arrays with x = np.asarray(x). We can see the resulting data types of these arrays via:

print(np.asarray(y_pred).dtype)  # Prints 'int64'

When dividing two int64 numbers, SciPy uses float64 precision, while TensorFlow will use float32 precision in the example above. The difference can be quite large, even for a single division:

>>> '%.15f' % (8.5 / 7)
'1.214285714285714'
>>> '%.15f' % (np.array(8.5, dtype=np.float32) / np.array(7, dtype=np.float32))
'1.214285731315613'
>>> '%.15f' % (np.array(8.5, dtype=np.float32) / np.array(7, dtype=np.float32) - 8.5 / 7)
'0.000000017029899'

You can get the same results for SciPy and TensorFlow by using float32 precision for y_pred and y_true:

import numpy as np
import tensorflow as tf
import scipy.stats as measures

y_pred = np.array([2, 2, 3, 4, 5, 5, 4, 2], dtype=np.float32)
y_true = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)

## Scipy
val2 = measures.pearsonr(y_pred, y_true)[0]
print("Scipy's Pearson: \t\t{}".format(val2))

## Tensorflow
logits = tf.placeholder(tf.float32, [8])
labels = tf.to_float(tf.Variable(y_true))

acc, acc_op = tf.contrib.metrics.streaming_pearson_correlation(logits,labels)

sess = tf.Session()
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sess.run(acc, {logits:y_pred})
sess.run(acc_op, {logits:y_pred})

print("Tensorflow's Pearson: \t{}".format(sess.run(acc,{logits:y_pred})))

prints

Scipy's Pearson:        0.38060760498046875
Tensorflow's Pearson:   0.38060760498046875

Differences between SciPy's and TensorFlow's computation

In the test scores you report, the difference is quite high. I took a look at the source and found the following differences:

1. Update ops

The result of tf.contrib.metrics.streaming_pearson_correlation is not stateless. It returns the correlation coefficient op, together with an update_op for new incoming data. If you call the update op with different data before calling the coefficient op with the actual y_pred, it will give a completely different result:

sess.run(tf.global_variables_initializer())

for _ in range(20):
    sess.run(acc_op, {logits: np.random.randn(*y_pred.shape)})

print("Tensorflow's Pearson: \t{}".format(sess.run(acc,{logits:y_pred})))

prints

Scipy's Pearson:        0.38060760498046875
Tensorflow's Pearson:   -0.0678008571267128

2. Different formulae

SciPy:

TensorFlow:

While mathematically the same, the computation of the correlation coefficient is different in TensorFlow. It uses the sample covariance for (x, x), (x, y) and (y, y) to compute the correlation coefficient, which can introduce different rounding errors.