I have:
df=pd.DataFrame({'A':[1,2,3,np.NaN,435,546],
'B':[10,2,3,4,867,23],
'C':[4,5,np.NaN, np.NaN,np.NaN,64]})
df
A B C
0 1.0 10 4.0
1 2.0 2 5.0
2 3.0 3 NaN
3 NaN 4 NaN
4 435.0 867 NaN
5 546.0 23 64.0
I compute correlation with df.corr()
which is returns the correlation matrix. According to documentation, correlation removes the NaN's, this when computing the correlation(A,B) there is 5 values to chose from, while correlation(A,C) has 3 values.
I ran this to get the number of elements based on each pairing.
for i in range(df.shape[1]):
for j in range(df.shape[1]):
if j==i: continue
print(df.columns[i],df.columns[j],df.iloc[:,np.r_[i,j]].dropna().shape)
A B (5, 2)
A C (3, 2)
B A (5, 2)
B C (3, 2)
C A (3, 2)
C B (3, 2)
How can I transform that so that I can get it in a similar matrix to the one using df.corr()
A B C
A 1.000000 0.508726 0.999916
B 0.508726 1.000000 0.920458
C 0.999916 0.920458 1.000000
Are you looking for the number of common non-nan:
s = df.notna().astype(int)
s.T @ s
Output:
A B C
A 5 5 3
B 5 6 3
C 3 3 3