I have a dataset with dates (this could potentially be anything but categorical data), and I want to count the number of occurence by day of week (and/or by month or year, with the same logic).
I have something like this, for the day of week (in french...) :
import numpy as np
import pandas as pd
# this is simulation, of course my datas are more complex ;-)
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)
# date conversion
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.day_name(locale = 'fr_FR')
# group by
data = df.groupby('day_of_week')['day_of_week'].agg(['count'])
data.reset_index(level=0, inplace=True)
# order by... order of the day in the week
JOUR_SEMAINE = ["Lundi", "Mardi", "Mercredi", "Jeudi", "Vendredi", "Samedi", "Dimanche"]
MAPPING_SEMAINE = {day: i for i, day in enumerate(JOUR_SEMAINE)}
key = data['day_of_week'].map(MAPPING_SEMAINE)
data = data.iloc[key.argsort()]
# see
data
. | day_of_week | count |
---|---|---|
0 | Lundi | 1 |
1 | Mercredi | 1 |
2 | Samedi | 2 |
This is nice, but how can i fill absent days with 0, to obtain this:
. | day_of_week | count |
---|---|---|
0 | Lundi | 1 |
1 | Mardi | 0 |
3 | Mercredi | 1 |
4 | Jeudi | 0 |
5 | Vendredi | 0 |
6 | Samedi | 2 |
7 | Dimanche | 0 |
To be more generic, i'm searching the more automatic way to have one value for each index of my categorical array (in this case JOUR_SEMAINE), either the calculated value (count) or 0...
Does anyone have a clue?
Thank's in advance
First, I would create the categorical data frame. We can get the week-day names using the calendar library.
import pandas as pd
import calendar as cal
weekdays = [wd for wd in cal.day_name]
weekday_count_df = pd.DataFrame(index=weekdays)
weekday_count_df
Next, we put your data into a data frame.
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
data_df = pd.DataFrame(data)
data_df['date'] = pd.to_datetime(data_df['date'])
data_df['day_of_week'] = data_df['date'].dt.day_name()
data_df
Next, we can join the two data frames.
data_group_df = data_df.groupby('day_of_week')['date'].count()
weekday_count_df = weekday_count_df.join(data_group_df)
weekday_count_df = weekday_count_df.fillna(0)
weekday_count_df
Finally, we can fix the column type and name.
weekday_count_df = weekday_count_df.rename(columns={'date':'count'})
weekday_count_df['count'] = weekday_count_df['count'].astype('int32')
weekday_count_df
The final result would be this:
count
Monday 1
Tuesday 0
Wednesday 1
Thursday 0
Friday 0
Saturday 2
Sunday 0