Search code examples
pythonpandasnumpylambda

How do I calculate average number of ride per week from Total Count in Pandas/Python


My dataframe (df) is a 12 months data which consist of 5m rows. One of the columns is day_of_week which are Monday to Sunday. This df also has a unique key which is the ride_id column. I want to calculate the average number of rides per day_of_week. I have calculated the number of rides per day_of_week using

copydf.groupby(['day_of_week']).agg(number_of_rides=('day_of_week', 'count'))

However, I find it hard to calculate the mean/average for each day of week. I have tried:

copydf.groupby(['day_of_week']).agg(number_of_rides=('ride_id', 'count')).mean()

and

avg_days = copydf.groupby(['day_of_week']).agg(number_of_rides=('ride_id', 'count'))
avg_days.groupby(['day_of_week']).agg('number_of_rides', 'mean')

They didn't work. I want the output to be in three columns, day_of_week, number_of_rides, and avg_num_of_ride or two columns day_of_week or weekday_num and avg_num_of_rides

This is my df. kindly note that code block have tampered with some columns line due to the long column names.

    ride_id rideable_type   started_at  ended_at    start_station_name  start_station_id    end_station_name    end_station_id  start_lat   start_lng   end_lat end_lng member_or_casual    ride_length year    month   day_of_week hour    weekday_num
0   9DC7B962304CBFD8    electric_bike   2021-09-28 16:07:10 2021-09-28 16:09:54 Streeter Dr & Grand Ave 13022   Streeter Dr & Grand Ave 13022   41.89   -87.68  41.89   -87.67  casual  2   2021    September   Tuesday 16  1
1   F930E2C6872D6B32    electric_bike   2021-09-28 14:24:51 2021-09-28 14:40:05 Streeter Dr & Grand Ave 13022   Streeter Dr & Grand Ave 13022   41.94   -87.64  41.98   -87.67  casual  15  2021    September   Tuesday 14  1
2   6EF72137900BB910    electric_bike   2021-09-28 00:20:16 2021-09-28 00:23:57 Streeter Dr & Grand Ave 13022   Streeter Dr & Grand Ave 13022   41.81   -87.72  41.80   -87.72  casual  3   2021    September   Tuesday 0   1

This is the output I desire

    number_of_rides average_number_of_rides
day_of_week 
Saturday    964079  50.4
Sunday  841919       70.9
Wednesday   840272   90.2
Thursday    836973    77.2
Friday  818205        34.4
Tuesday 814496       34.4
Monday  767002        200.3

Again, I have calculated the number of ride per day_of_week, what I want to do is just to add the third column or better still, have average_ride per weekday(Monday or 0, Tuesday or 1, Wednesday or 2) on its own output df

Thanks


Solution

  • To get average number of rides per week day, you need total rides on that week day and number of weeks.

    You can compute the week number from date:

    df["week_number"] = df["started_at"].dt.isocalendar().week
    
    >>    ride_id started_at day_of_week  week_number
    >> 0        1 2021-09-20      Monday           38
    >> 1        2 2021-09-21     Tuesday           38
    >> 2        3 2021-09-20      Monday           38
    >> 3        4 2021-09-21     Tuesday           38
    >> 4        5 2021-09-27      Monday           39
    >> 5        6 2021-09-28     Tuesday           39
    

    Then group by day_of_week and week_number to compute an aggregate dataframe:

    week_number_group_df = df.groupby(["day_of_week", "week_number"]).agg(number_of_rides_on_day=("ride_id", "count"))
    
    >>                             number_of_rides_on_day
    >> day_of_week   week_number                          
    >> Monday        38                                  2
    >>               39                                  1
    >> Tuesday       38                                  2
    >>               39                                  1
    

    Use the aggregated dataframe to get the final results:

    week_number_group_df.groupby("day_of_week").agg(number_of_rides=("number_of_rides_on_day", "sum"), average_number_of_rides=("number_of_rides_on_day", "mean"))
    
    >>              number_of_rides  average_number_of_rides
    >> day_of_week                                          
    >> Monday                     3                   1.5000
    >> Tuesday                    3                   1.5000