Search code examples
pandasscipy

Scipy reporting different spearman correlation than pandas


print(n1)
print(n2)
print(type(n1), type(n2))
print(scipy.stats.spearmanr(n1, n2))
print(n1.corr(n2, method="spearman"))
0    2317.0
1    2293.0
2    1190.0
3     972.0
4    1391.0
Name: r6000, dtype: float64
0.0    2317.0
1.0    2293.0
3.0    1190.0
4.0     972.0
5.0    1391.0
Name: 6000, dtype: float64
<class 'pandas.core.series.Series'> <class 'pandas.core.series.Series'>
SpearmanrResult(correlation=0.9999999999999999, pvalue=1.4042654220543672e-24)
0.7999999999999999

The problem is that scipy is reporting a different correlation value than pandas.


Solution

  • I made a copy and called reset_index() on the series before correlating them. That fixed it.

    The issue is intrinsic automatic data alignment in pandas based on the indexes.

    scipy library doesn't do automatic data alignment, likely just converts it to a numpy array.