How can I group data into months from dates where a data frame has both categorical and numerical data in pandas. I tried the groupby function but I think it won't work with categorical data. There are multiple values in the categorical column. Sample data:
Date | Campaign_Name | No_of_Male_Viewers | No_of_Female_Viewers |
---|---|---|---|
2021-06-12 | Dove_birds | 1268 | 7656 |
2021-02-05 | Pantene_winner | 657 | 8964 |
2021-09-15 | Budweiser_wazap | 7642 | 76 |
2021-05-13 | Pantene_winner | 425 | 6578 |
2021-12-12 | Budweiser_wazap | 9867 | 111 |
2021-09-09 | Dove_birds | 1578 | 11456 |
2021-05-24 | Pantene_winner | 678 | 7475 |
2021-09-27 | Budweiser_wazap | 8742 | 96 |
2021-09-09 | Dove_soft | 1175 | 15486 |
Now I need to group the data months wise and show for example that Budweiser_wazap in September gained a total audience of xxxx and in December gained xxxx audience and so on for the other campaigns as well.
Expected output sample:
Month | Campaign_Name | No_of_Male_Viewers | No_of_Female_Viewers |
---|---|---|---|
February | Pantene_winner | 657 | 8964 |
September | Budweiser_wazap | 16384 | 172 |
Since Budweiser_wazap campaign ran twice in September, the resulting output for No_of_Male_Viewers is: 7642 + 8742 = 16384, and for No_of_Female_Viewers is: 76 + 96 = 172.
USE-
##Get Month Name for each date
df['Month'] = df['Date'].dt.month_name()
#Groupby `Month` & `Campaign_Name`
df.groupby(['Month', 'Campaign_Name'])[['No_of_Male_viewers', 'No_of_Female_viewers']].sum().reset_index()
df
Sample Reproducible code-
import pandas as pd
import numpy as np
from pandas import DataFrame
df = pd.DataFrame({
'Date' : ['2015-06-08', '2015-08-05', '2015-05-06', '2015-05-05', '2015-07-08', '2015-05-07', '2015-06-05', '2015-07-05'],
'Sym' : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'],
'Data2': [11, 8, 10, 15, 110, 60, 100, 40],
'Data3': [5, 8, 6, 1, 50, 100, 60, 120]
})
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month_name()
df
df output-
Date Sym Data2 Data3 Month
0 2015-06-08 aapl 11 5 June
1 2015-08-05 aapl 8 8 August
2 2015-05-06 aapl 10 6 May
3 2015-05-05 aapl 15 1 May
4 2015-07-08 aaww 110 50 July
5 2015-05-07 aaww 60 100 May
6 2015-06-05 aaww 100 60 June
7 2015-07-05 aaww 40 120 July
Groupby Condition-
df.groupby(['Month', 'Sym'])[['Data2', 'Data3']].sum().reset_index()
Output-
Month Sym Data2 Data3
0 August aapl 8 8
1 July aaww 150 170
2 June aapl 11 5
3 June aaww 100 60
4 May aapl 25 7
5 May aaww 60 100
Ref link- Pandas - dataframe groupby - how to get sum of multiple columns