Search code examples
pythonpandasprobability

Calculate conditional probability using groupby and shift operations in Pandas dataframe


I have a dataframe with patients and their visits and the presence of a disease in their left and/or right eye is labeled with {0,1} values (0 = not present and 1 = present). The dataset looks like this:

Patient   R L

P_1       0 1

P_1       1 1

P_1       0 1

P_1       0 1

P_1       0 1

P_2       1 1

P_2       0 1

P_2       0 1

P_2       1 1

P_3       0 0

P_3       1 1

P_3       0 0

P_3       0 1

P_3       1 1

P_3       0 1

and so on.....

How can I calculate, for example, the conditional probability of P(R=1 | L=1) using grouby and shift operations in an elegant way?


Solution

  • IIUC:

    df.groupby('L').R.mean()
    

    gives

    L
    0    0.000000
    1    0.384615
    Name: R, dtype: float64
    

    So the answer: P(R=1|L=1) = 0.384, and P(R=1|L=0) = 0.

    Or if we want to get probability on patients as well:

    df.groupby(['Patient','L']).R.mean()
    

    gives:

    Patient  L
    P_1      1    0.2
    P_2      1    0.5
    P_3      0    0.0
             1    0.5
    Name: R, dtype: float64
    

    so, for example, P(R=1|Patent=P_3, L=1) = 0.5.