Search code examples
pandasgroupcategorical

Grouping data month-wise with Categorical data in pandas


How can I group data into months from dates where a data frame has both categorical and numerical data in pandas. I tried the groupby function but I think it won't work with categorical data. There are multiple values in the categorical column. Sample data:

Date Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
2021-06-12 Dove_birds 1268 7656
2021-02-05 Pantene_winner 657 8964
2021-09-15 Budweiser_wazap 7642 76
2021-05-13 Pantene_winner 425 6578
2021-12-12 Budweiser_wazap 9867 111
2021-09-09 Dove_birds 1578 11456
2021-05-24 Pantene_winner 678 7475
2021-09-27 Budweiser_wazap 8742 96
2021-09-09 Dove_soft 1175 15486

Now I need to group the data months wise and show for example that Budweiser_wazap in September gained a total audience of xxxx and in December gained xxxx audience and so on for the other campaigns as well.

Expected output sample:

Month Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
February Pantene_winner 657 8964
September Budweiser_wazap 16384 172

Since Budweiser_wazap campaign ran twice in September, the resulting output for No_of_Male_Viewers is: 7642 + 8742 = 16384, and for No_of_Female_Viewers is: 76 + 96 = 172.


Solution

  • USE-

    ##Get Month Name for each date
    df['Month'] = df['Date'].dt.month_name()
    
    #Groupby `Month` & `Campaign_Name`
    df.groupby(['Month', 'Campaign_Name'])[['No_of_Male_viewers', 'No_of_Female_viewers']].sum().reset_index()
    df
    

    Sample Reproducible code-

    import pandas as pd
    import numpy as np
    from pandas import DataFrame
    
    df = pd.DataFrame({
        'Date' : ['2015-06-08', '2015-08-05', '2015-05-06', '2015-05-05', '2015-07-08', '2015-05-07', '2015-06-05', '2015-07-05'], 
        'Sym'  : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 
        'Data2': [11, 8, 10, 15, 110, 60, 100, 40],
        'Data3': [5, 8, 6, 1, 50, 100, 60, 120]
    })
    df['Date'] = pd.to_datetime(df['Date'])
    df['Month'] = df['Date'].dt.month_name()
    df
    

    df output-

    Date           Sym  Data2   Data3  Month
    0   2015-06-08  aapl    11  5   June
    1   2015-08-05  aapl    8   8   August
    2   2015-05-06  aapl    10  6   May
    3   2015-05-05  aapl    15  1   May
    4   2015-07-08  aaww    110 50  July
    5   2015-05-07  aaww    60  100 May
    6   2015-06-05  aaww    100 60  June
    7   2015-07-05  aaww    40  120 July
    

    Groupby Condition-

    df.groupby(['Month', 'Sym'])[['Data2', 'Data3']].sum().reset_index()
    

    Output-

        Month   Sym   Data2 Data3
    0   August  aapl    8   8
    1   July    aaww    150 170
    2   June    aapl    11  5
    3   June    aaww    100 60
    4   May     aapl    25  7
    5   May     aaww    60  100
    

    Ref link- Pandas - dataframe groupby - how to get sum of multiple columns