I have data that contains id, gender, price, time to pay. For example:
import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
'price': [250, 1000,300, 250, 1000, 500, 450, 500],
'timeToPay':['0 days 01:20:00','1 days 03:24:02','0 days 12:45:55','0 days 05:38:20','0 days 02:44:12','0 days 11:25:38','1 days 01:11:00','0 days 05:22:00']})
Time to pay, is the time difference between when the customer orders and pays (datatype timedelta64[ns]).
How I can get the best time to pay range for this data, I mean, do people pay between 0-1 hours or 4-6 hours or maybe 1 day - 2 days. I want to know how long people pay for the services I offer.
I try to group by the data based on time to pay but I think it doesn't give information I need
IIUC,
I modified you code a little bit to make it easier to reproduce.
import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
'price': [250, 1000,300, 250, 1000, 500, 450, 500],
'timeToPay':[ '0 days 01:20:00'
,'1 days 03:24:02'
,'0 days 12:45:55'
,'0 days 05:38:20'
,'0 days 02:44:12'
,'0 days 11:25:38'
,'1 days 01:11:00'
,'0 days 05:22:00']})
df1['timeToPay']=df1['timeToPay'].apply(lambda x: pd.Timedelta(x))
now timeToPay
is a timedelta
, then you may transform timeToPay
to hour
and day
with this snippet.
import math
df1['timeToPay_hour']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(60*60)))
df1['timeToPay_day']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(24*60*60)))
df1
Now, your df1
looks like this
id | gender | price | timeToPay | timeToPay_hour | timeToPay_day | |
---|---|---|---|---|---|---|
0 | 1 | Male | 250 | 0 days 01:20:00 | 2 | 1 |
1 | 2 | Female | 1000 | 1 days 03:24:02 | 28 | 2 |
2 | 3 | Male | 300 | 0 days 12:45:55 | 13 | 1 |
3 | 4 | Female | 250 | 0 days 05:38:20 | 6 | 1 |
4 | 5 | Male | 1000 | 0 days 02:44:12 | 3 | 1 |
5 | 6 | Female | 500 | 0 days 11:25:38 | 12 | 1 |
6 | 7 | Male | 450 | 1 days 01:11:00 | 26 | 2 |
7 | 8 | Male | 500 | 0 days 05:22:00 | 6 | 1 |
Then, you may compare with gender with timeToPay_hour
like this.
df1[['gender','timeToPay_hour']].hist(bins=5)
Hope this help.