After analyzing the dataset, how can we find the correlation of all attributes?
correlations = data.corr(method='pearson')
print(correlation>=0.50)
I'm not getting the proper output.
Data:
import pprint
np.random.seed(4)
df = pd.DataFrame(np.random.standard_normal((1000, 5)))
df.columns = list("ABCDE")
df_cor = df.corr(method='pearson')
df.head():
A B C D E
0 0.050562 0.499951 -0.995909 0.693599 -0.418302
1 -1.584577 -0.647707 0.598575 0.332250 -1.147477
2 0.618670 -0.087987 0.425072 0.332253 -1.156816
3 0.350997 -0.606887 1.546979 0.723342 0.046136
4 -0.982992 0.054433 0.159893 -1.208948 2.223360
df_cor:
A B C D E
A 1.000000 -0.008658 -0.015977 -0.001219 -0.008043
B -0.008658 1.000000 0.037419 -0.055335 0.057751
C -0.015977 0.037419 1.000000 0.000049 0.057091
D -0.001219 -0.055335 0.000049 1.000000 -0.017879
E -0.008043 0.057751 0.057091 -0.017879 1.000000
# Checking for correlations > Absulute 0.05. Here i `0.05`, change it to `0.5` at your end.
df_cor[df_cor.abs() > .05].dropna(axis=1, how='all').replace(1., np.nan).dropna(how='all', axis=1).dropna(how='all', axis=0).apply(lambda x:x.dropna().to_dict() ,axis=1).to_dict()
{'B': {'D': -0.0553348494117175, 'E': 0.057751329924049855},
'C': {'E': 0.057091148280687266},
'D': {'B': -0.0553348494117175},
'E': {'B': 0.057751329924049855, 'C': 0.057091148280687266}}
if you need dataframe output:
df_cor[df_cor.abs() > .05].replace(1, np.nan)
A B C D E
A NaN NaN NaN NaN NaN
B NaN NaN NaN -0.055335 0.057751
C NaN NaN NaN NaN 0.057091
D NaN -0.055335 NaN NaN NaN
E NaN 0.057751 0.057091 NaN NaN
after dropping columns where is no value:
df_cor[df_cor.abs() > .05].replace(1, np.nan).dropna(how='all', axis=1)
B C D E
A NaN NaN NaN NaN
B NaN NaN -0.055335 0.057751
C NaN NaN NaN 0.057091
D -0.055335 NaN NaN NaN
E 0.057751 0.057091 NaN NaN