This is a continuation of a previous question.
Now, I want to have these occurrences grouped by year so that the output would become something like:
Combo Occurrence (2017) Occurrence (2018) Occurrence (2019) Occurrence (2020)
0 DK,NO4 2 x x x
1 DK1,NO1 1 x x x
2 DK,NO1,NO2 1 x x x
3 DK,NO 1 x x x
The input data looks like this:
Year Month Day Weekday NO1 ... SE3 SE4 FI DK1 DK2
0 2017 1 1 7 28.4 ... 24.03 24.03 24.03 20.96 20.96
1 2017 1 1 7 28.2 ... 25.05 25.05 25.05 25.05 25.05
2 2017 1 1 7 28.0 ... 25.05 25.05 25.05 25.05 25.05
3 2017 1 1 7 28.0 ... 23.19 23.19 23.19 16.03 16.03
4 2017 1 1 7 28.0 ... 24.10 24.10 24.10 16.43 16.43
... ... ... ... ... ... ... ... ... ... ... ...
35063 2020 12 31 4 31.0 ... 31.00 58.04 35.32 89.35 89.35
35064 2020 12 31 4 24.8 ... 24.84 54.45 24.84 56.70 56.70
35065 2020 12 31 4 24.8 ... 24.77 51.18 28.00 52.44 52.44
35066 2020 12 31 4 24.6 ... 24.61 45.84 26.55 51.86 51.86
35067 2020 12 31 4 24.1 ... 24.07 24.07 24.07 78.66 78.66
Idea is convert all columns with no processing strings to index
and then is used SeriesGroupBy.value_counts
for counts values per years:
df = df.set_index(['Year','Month','Day','Weekday'])
df = (df.eq(df['DK1 Up'], axis=0)
.dot(df.columns + ',')
.str[:-1]
.to_frame('Combo')
.groupby('Year')['Combo']
.value_counts()
.unstack(0, fill_value=0)
.add_prefix('Occurrence ')
.rename_axis(columns=None)
.reset_index()
)
print (df)
Combo Occurrence 2017 Occurrence 2020
0 DK1 Up,DK1 Down,DK2 Up,DK2 Down 2 0
1 DK1 Up,DK2 Up 3 5