I have a dataframe with patients and their visits and the presence of a disease in their left and/or right eye is labeled with {0,1} values (0 = not present and 1 = present). The dataset looks like this:
Patient R L
P_1 0 1
P_1 1 1
P_1 0 1
P_1 0 1
P_1 0 1
P_2 1 1
P_2 0 1
P_2 0 1
P_2 1 1
P_3 0 0
P_3 1 1
P_3 0 0
P_3 0 1
P_3 1 1
P_3 0 1
and so on.....
How can I calculate, for example, the conditional probability of P(R=1 | L=1) using grouby and shift operations in an elegant way?
IIUC:
df.groupby('L').R.mean()
gives
L
0 0.000000
1 0.384615
Name: R, dtype: float64
So the answer: P(R=1|L=1) = 0.384
, and P(R=1|L=0) = 0
.
Or if we want to get probability on patients as well:
df.groupby(['Patient','L']).R.mean()
gives:
Patient L
P_1 1 0.2
P_2 1 0.5
P_3 0 0.0
1 0.5
Name: R, dtype: float64
so, for example, P(R=1|Patent=P_3, L=1) = 0.5
.