CustomerID | Buying_Round | Date | Purchase_amount$ |
---|---|---|---|
1001 | 2 | 01/02/2020 | 20 |
1001 | NaN | 07/03/2020 | 42 |
1001 | NaN | 01/01/2020 | 15 |
1002 | NaN | 01/07/2020 | 10 |
1002 | NaN | 07/04/2020 | 40 |
1002 | NaN | 12/11/2020 | 25 |
1003 | 1 | 22/02/2020 | 30 |
1003 | NaN | 14/03/2020 | 52 |
1003 | NaN | 10/10/2020 | 45 |
Customer Id is the unique number of each customer based on some confidential data. This is for a grocery store which is trying to understand how a customer tends to buy based on the round he comes in and the amount he purchases to improve inventory. Buying round is the the nth time they have come to store.
What I have done is out of this jumbled info I sorted the customer Id I can also sort the date but wanted to keep the problem as original as possible. After sorting the dates it is evident that the date influences the Buying Round and I intent to keep the buying round for calculation of Round vs purchase. Now I want to fill the buying round in increasing order from 1- the number of times the unique customer occur and then start from 1 again for the next customer.
Expected Output :
CustomerID | Buying_Round | Date | Purchase_amount$ |
---|---|---|---|
1001 | 2 | 01/02/2020 | 20 |
1001 | 3 | 07/03/2020 | 42 |
1001 | 1 | 01/01/2020 | 15 |
1002 | 2 | 01/07/2020 | 10 |
1002 | 1 | 07/04/2020 | 40 |
1002 | 3 | 12/11/2020 | 25 |
1003 | 1 | 22/02/2020 | 30 |
1003 | 2 | 14/03/2020 | 52 |
1003 | 3 | 10/10/2020 | 45 |
Note: The 1001 is an example the original data have 1001 occur 12 times 1002 occur 4 times and 1003 occur 15 times with total 11000 unique customers for this year with no pattern or fixed values to determine how many number for each unique id, we sure have value_counts
but want to work with a simpler way other than hardcoding it.
We can try groupby rank
after converting the Date
column to_datetime
:
df['Buying_Round'] = (
pd.to_datetime(df['Date'], dayfirst=True)
.groupby(df['CustomerID']).rank(method='dense')
.astype(int)
)
Or with sort_values
and groupby cumcount
:
df['Buying_Round'] = (
pd.to_datetime(df['Date'], dayfirst=True)
.sort_values()
.groupby(df['CustomerID']).cumcount() + 1
)
Both Produce:
CustomerID Buying_Round Date Purchase_amount$
0 1001 2 01/02/2020 20
1 1001 3 07/03/2020 42
2 1001 1 01/01/2020 15
3 1002 2 01/07/2020 10
4 1002 1 07/04/2020 40
5 1002 3 12/11/2020 25
6 1003 1 22/02/2020 30
7 1003 2 14/03/2020 52
8 1003 3 10/10/2020 45