Search code examples
python-3.xpandasprobability

calculate conditional probability


Input

cust_Id  category  product  purchased
1        Elec      light    0    
1        Elec      light    1
1        Elec      light    0
1        HA        Table    1
1        HH        Pen      1
2        Elec      light    0
2        HA        Table    1
3        HH        Pen      0
3        Elec      light    1

I want to know the best customer,category,product based on maximum probability value


Solution

  • Try this:

    grp = df.groupby(['cust_Id', 'category', 'product'])
    prob = grp.sum() / grp.count()
    

    Result is the probability that a particular combination of the 3 attributes will purchase something:

                              purchased
    cust_Id category product           
    1       Elec     light     0.333333
            HA       Table     1.000000
            HH       Pen       1.000000
    2       Elec     light     0.000000
            HA       Table     1.000000
    3       Elec     light     1.000000
            HH       Pen       0.000000
    

    The probability of them not purchase anything is simply the complement of that (i.e. 1 - prob)