Search code examples
pythonpandasdataframepandas-groupbypandas-loc

Identify number of occurrence grouped by 'Year' - continued


This is a continuation of a previous question.

Now, I want to have these occurrences grouped by year so that the output would become something like:

        Combo  Occurrence (2017)    Occurrence (2018)    Occurrence (2019)   Occurrence (2020)
0      DK,NO4           2                   x                   x                   x
1     DK1,NO1           1                   x                   x                   x
2  DK,NO1,NO2           1                   x                   x                   x
3       DK,NO           1                   x                   x                   x

The input data looks like this:

      Year  Month  Day  Weekday   NO1  ...    SE3    SE4     FI    DK1    DK2
0      2017      1    1        7  28.4  ...  24.03  24.03  24.03  20.96  20.96
1      2017      1    1        7  28.2  ...  25.05  25.05  25.05  25.05  25.05
2      2017      1    1        7  28.0  ...  25.05  25.05  25.05  25.05  25.05
3      2017      1    1        7  28.0  ...  23.19  23.19  23.19  16.03  16.03
4      2017      1    1        7  28.0  ...  24.10  24.10  24.10  16.43  16.43
...     ...    ...  ...      ...   ...  ...    ...    ...    ...    ...    ...
35063  2020     12   31        4  31.0  ...  31.00  58.04  35.32  89.35  89.35
35064  2020     12   31        4  24.8  ...  24.84  54.45  24.84  56.70  56.70
35065  2020     12   31        4  24.8  ...  24.77  51.18  28.00  52.44  52.44
35066  2020     12   31        4  24.6  ...  24.61  45.84  26.55  51.86  51.86
35067  2020     12   31        4  24.1  ...  24.07  24.07  24.07  78.66  78.66

Solution

  • Idea is convert all columns with no processing strings to index and then is used SeriesGroupBy.value_counts for counts values per years:

    df = df.set_index(['Year','Month','Day','Weekday'])
    
    df = (df.eq(df['DK1 Up'], axis=0)
            .dot(df.columns + ',')
            .str[:-1]
            .to_frame('Combo')
            .groupby('Year')['Combo']
            .value_counts()
            .unstack(0, fill_value=0)
            .add_prefix('Occurrence ')
            .rename_axis(columns=None)
            .reset_index()
            )
    print (df)
                                 Combo  Occurrence 2017  Occurrence 2020
    0  DK1 Up,DK1 Down,DK2 Up,DK2 Down                2                0
    1                    DK1 Up,DK2 Up                3                5