python pandas frequency-analysis frequency-distribution

How to create a Frequency Distribution Matrix from a Pandas DataFrame of boolian values

In short, I'm trying to translate a DataFrame like this

Patient   Cough   Headache   Dizzy
   1        1         0        0 
   2        1         1        1
   3        0         1        0 
   4        1         0        1
   5        0         1        0

into a frequency distribution matrix similar to Pandas correlation feature.

That is to say, it would return something like this

        Cough   Headache   Dizzy
Cough     1       0.33     0.66
Headache 0.33       1      0.33
Dizzy     1       0.5       1

because 1 in 3 people with Headache were Dizzy, but only 1 in 2 people who were Dizzy had a Headache, etc.

The actual data I want to use it on is a lot bigger, so I was just curious if Pandas has a way to do this automatically.

Solution

Something like this?

# extract columns of interest
s = df.iloc[:,1:]

# output
((s.T @ s)/s.sum()).T

Output:

             Cough  Headache     Dizzy
Cough     1.000000  0.333333  0.666667
Headache  0.333333  1.000000  0.333333
Dizzy     1.000000  0.500000  1.000000