I have a table of values with two category fields and a single count. I need to compute the correlation among rows of one category based on their counts across the other category.
So for instance:
Category_A|Category_B|Count
Alan |Turkey |7
Alan |Ham |1
Alan |Spam |0
...
Bob |Turkey |2
Bob |Ham |9
Bob |Spam |12
...
I need to get Pearson's correlation with p-value in a table like this:
Category_A_1|Category_A_2|Correlation|P Value
Alan |Bob |0.7 |0.07
I can't figure out how to do this in Pandas. Please help. Thanks!
corrs = df.pivot('Category_A','Category_B').T.corr().stack()
#Category_A Category_A
#Alan Alan 1.000000
# Bob -0.986552
#Bob Alan -0.986552
# Bob 1.000000
corrs.index.names = 'A','B'
corrs.reset_index()
# A B 0
#0 Alan Alan 1.000000
#1 Alan Bob -0.986552
#2 Bob Alan -0.986552
#3 Bob Bob 1.000000
Unfortunately, Pandas does not have tools for calculating the p-values.