Search code examples
pythonpandasdataframetimedeltadatetimerangefield

How to know time people need to pay for the services I offer


I have data that contains id, gender, price, time to pay. For example:

import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                    'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                    'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                    'timeToPay':['0 days 01:20:00','1 days 03:24:02','0 days 12:45:55','0 days 05:38:20','0 days 02:44:12','0 days 11:25:38','1 days 01:11:00','0 days 05:22:00']})

Time to pay, is the time difference between when the customer orders and pays (datatype timedelta64[ns]).

How I can get the best time to pay range for this data, I mean, do people pay between 0-1 hours or 4-6 hours or maybe 1 day - 2 days. I want to know how long people pay for the services I offer.

I try to group by the data based on time to pay but I think it doesn't give information I need


Solution

  • IIUC,

    I modified you code a little bit to make it easier to reproduce.

    import pandas as pd
    df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                        'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                        'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                        'timeToPay':[ '0 days 01:20:00'
                                     ,'1 days 03:24:02'
                                     ,'0 days 12:45:55'
                                     ,'0 days 05:38:20'
                                     ,'0 days 02:44:12'
                                     ,'0 days 11:25:38'
                                     ,'1 days 01:11:00'
                                     ,'0 days 05:22:00']})
    df1['timeToPay']=df1['timeToPay'].apply(lambda x: pd.Timedelta(x))
    

    now timeToPay is a timedelta, then you may transform timeToPay to hour and day with this snippet.

    import math
    df1['timeToPay_hour']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(60*60)))
    df1['timeToPay_day']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(24*60*60)))
    
    df1
    

    Now, your df1 looks like this

    id gender price timeToPay timeToPay_hour timeToPay_day
    0 1 Male 250 0 days 01:20:00 2 1
    1 2 Female 1000 1 days 03:24:02 28 2
    2 3 Male 300 0 days 12:45:55 13 1
    3 4 Female 250 0 days 05:38:20 6 1
    4 5 Male 1000 0 days 02:44:12 3 1
    5 6 Female 500 0 days 11:25:38 12 1
    6 7 Male 450 1 days 01:11:00 26 2
    7 8 Male 500 0 days 05:22:00 6 1

    Then, you may compare with gender with timeToPay_hour like this.

    df1[['gender','timeToPay_hour']].hist(bins=5)
    

    Hope this help.