My dataframe (df) is a 12 months data which consist of 5m rows. One of the columns is day_of_week
which are Monday to Sunday. This df also has a unique key which is the ride_id
column. I want to calculate the average number of rides per day_of_week. I have calculated the number of rides per day_of_week using
copydf.groupby(['day_of_week']).agg(number_of_rides=('day_of_week', 'count'))
However, I find it hard to calculate the mean/average for each day of week. I have tried:
copydf.groupby(['day_of_week']).agg(number_of_rides=('ride_id', 'count')).mean()
and
avg_days = copydf.groupby(['day_of_week']).agg(number_of_rides=('ride_id', 'count'))
avg_days.groupby(['day_of_week']).agg('number_of_rides', 'mean')
They didn't work. I want the output to be in three columns, day_of_week, number_of_rides, and avg_num_of_ride or two columns day_of_week or weekday_num and avg_num_of_rides
This is my df. kindly note that code block have tampered with some columns line due to the long column names.
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_or_casual ride_length year month day_of_week hour weekday_num
0 9DC7B962304CBFD8 electric_bike 2021-09-28 16:07:10 2021-09-28 16:09:54 Streeter Dr & Grand Ave 13022 Streeter Dr & Grand Ave 13022 41.89 -87.68 41.89 -87.67 casual 2 2021 September Tuesday 16 1
1 F930E2C6872D6B32 electric_bike 2021-09-28 14:24:51 2021-09-28 14:40:05 Streeter Dr & Grand Ave 13022 Streeter Dr & Grand Ave 13022 41.94 -87.64 41.98 -87.67 casual 15 2021 September Tuesday 14 1
2 6EF72137900BB910 electric_bike 2021-09-28 00:20:16 2021-09-28 00:23:57 Streeter Dr & Grand Ave 13022 Streeter Dr & Grand Ave 13022 41.81 -87.72 41.80 -87.72 casual 3 2021 September Tuesday 0 1
This is the output I desire
number_of_rides average_number_of_rides
day_of_week
Saturday 964079 50.4
Sunday 841919 70.9
Wednesday 840272 90.2
Thursday 836973 77.2
Friday 818205 34.4
Tuesday 814496 34.4
Monday 767002 200.3
Again, I have calculated the number of ride per day_of_week, what I want to do is just to add the third column or better still, have average_ride per weekday(Monday or 0, Tuesday or 1, Wednesday or 2) on its own output df
Thanks
To get average number of rides per week day, you need total rides on that week day and number of weeks.
You can compute the week number from date:
df["week_number"] = df["started_at"].dt.isocalendar().week
>> ride_id started_at day_of_week week_number
>> 0 1 2021-09-20 Monday 38
>> 1 2 2021-09-21 Tuesday 38
>> 2 3 2021-09-20 Monday 38
>> 3 4 2021-09-21 Tuesday 38
>> 4 5 2021-09-27 Monday 39
>> 5 6 2021-09-28 Tuesday 39
Then group by day_of_week
and week_number
to compute an aggregate dataframe:
week_number_group_df = df.groupby(["day_of_week", "week_number"]).agg(number_of_rides_on_day=("ride_id", "count"))
>> number_of_rides_on_day
>> day_of_week week_number
>> Monday 38 2
>> 39 1
>> Tuesday 38 2
>> 39 1
Use the aggregated dataframe to get the final results:
week_number_group_df.groupby("day_of_week").agg(number_of_rides=("number_of_rides_on_day", "sum"), average_number_of_rides=("number_of_rides_on_day", "mean"))
>> number_of_rides average_number_of_rides
>> day_of_week
>> Monday 3 1.5000
>> Tuesday 3 1.5000