Search code examples
pythonpandasdataframeprobability

Finding Probability of an Element belonging to a List


I am using the BlackFriday.csv dataset. Below is an excerpt from it.

    User_ID  Gender  Age    City_Category   Purchase
0   1000001     1     0           1          8370
1   1000001     1     0           1          15200
2   1000001     1     0           1          1422
3   1000001     1     0           1          1057
4   1000003     0     2           1          15227
5   1000004     0     4           2          19215
6   1000004     0     4           2          15854
7   1000004     0     4           2          15686
8   1000005     0     2           1          5254
11  1000005     0     2           1          15665
12  1000006     1     5           1          2079
13  1000006     1     5           1          13055
14  1000006     1     5           1          8851
15  1000007     0     3           2          11788
16  1000008     0     2           0          8584

I want to know how to calculate the probability of a customer belonging to City_Category = 2.


Solution

  • You can use:

    s1=df.drop_duplicates('User_ID').groupby('City_Category')['User_ID'].count()
    df2=(s1/(s1.sum())*100).to_frame()
    

    Output df2:

    User_ID
    City_Category   
    0   14.285714
    1   57.142857
    2   28.571429