Search code examples
pythonpandasprobability

How to calculate conditional probability of all possible pairs of columns?


I have following boolean dataframe in pandas:

      Cat  Dog  Mouse
Alex   1    0     1
Lola   0    0     1
Bob    1    1     1

Each cell contains true/false saying whether someone has animal or not. I would like to get dataframe which contains conditional probability of each pair of animals where rows dictate condition.

      Cat  Dog  Mouse
Cat    1   50%    1
Dog    1    1     1
Mouse 66%  33%    1

Is there fast way of doing this in pandas? If yes, then how?


Solution

  • You can use a dot product between the df and the transposed df and calculate the rank as percentage:

    df.T.dot(df).rank(axis=1,method='dense',pct=True).round(3)
    

             Cat    Dog  Mouse
    Cat    1.000  0.500    1.0
    Dog    1.000  1.000    1.0
    Mouse  0.667  0.333    1.0