python pandas dataframe timedelta datetimerangefield

How to know time people need to pay for the services I offer

I have data that contains id, gender, price, time to pay. For example:

import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                    'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                    'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                    'timeToPay':['0 days 01:20:00','1 days 03:24:02','0 days 12:45:55','0 days 05:38:20','0 days 02:44:12','0 days 11:25:38','1 days 01:11:00','0 days 05:22:00']})

Time to pay, is the time difference between when the customer orders and pays (datatype timedelta64[ns]).

How I can get the best time to pay range for this data, I mean, do people pay between 0-1 hours or 4-6 hours or maybe 1 day - 2 days. I want to know how long people pay for the services I offer.

I try to group by the data based on time to pay but I think it doesn't give information I need

Solution

IIUC,

I modified you code a little bit to make it easier to reproduce.

import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                    'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                    'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                    'timeToPay':[ '0 days 01:20:00'
                                 ,'1 days 03:24:02'
                                 ,'0 days 12:45:55'
                                 ,'0 days 05:38:20'
                                 ,'0 days 02:44:12'
                                 ,'0 days 11:25:38'
                                 ,'1 days 01:11:00'
                                 ,'0 days 05:22:00']})
df1['timeToPay']=df1['timeToPay'].apply(lambda x: pd.Timedelta(x))

now timeToPay is a timedelta, then you may transform timeToPay to hour and day with this snippet.

import math
df1['timeToPay_hour']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(60*60)))
df1['timeToPay_day']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(24*60*60)))

df1

Now, your df1 looks like this

	id	gender	price	timeToPay	timeToPay_hour	timeToPay_day
0	1	Male	250	0 days 01:20:00	2	1
1	2	Female	1000	1 days 03:24:02	28	2
2	3	Male	300	0 days 12:45:55	13	1
3	4	Female	250	0 days 05:38:20	6	1
4	5	Male	1000	0 days 02:44:12	3	1
5	6	Female	500	0 days 11:25:38	12	1
6	7	Male	450	1 days 01:11:00	26	2
7	8	Male	500	0 days 05:22:00	6	1

Then, you may compare with gender with timeToPay_hour like this.

df1[['gender','timeToPay_hour']].hist(bins=5)

Hope this help.