Search code examples
pythonpandasdataframenancorrelation

DataFrame correlation produces NaN although its values are all integers


I have a dataframe df that looks like:

                          0              1       2  3  4  5   6  7   8
0  2014-03-19T12:44:32.695Z  1395233072695  703425  0  2  1  13  5  21
1  2014-03-19T12:44:32.727Z  1395233072727  703425  0  2  1  13  5  21

The columns are all type int (except the first one):

0     object
1      int64
2      int64
3      int64
4      int64
5      int64
6      int64
7      int64
8      int64

But in my correlation, some columns seem to be NaN. When I call df.corr(), I get the following output:

          1    2         3          4    5    6          7         8
1  1.000000  NaN  0.018752  -0.550307  NaN  NaN   0.075191  0.775725
2       NaN  NaN       NaN        NaN  NaN  NaN        NaN       NaN
3  0.018752  NaN  1.000000  -0.067293  NaN  NaN  -0.579651  0.004593
...

Solution

  • Those columns do not change in value right now, yes

    As, Joris points out you would expected NaN if the values do not vary. To see why take a look at correlation formula:

    cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]
    

    If the values of the ith or jth variable do not vary, then the respective standard deviation will be zero and so will the denominator of the fraction. Thus, the correlation will be NaN.