I have count data from different regions per year. The original data is structured like this:
count region year
1 1 A 2011
2 2 A 2010
3 1 A 2009
4 5 A 2008
5 4 A 2007
6 2 B 2011
7 2 B 2010
8 1 B 2009
9 5 B 2008
10 3 B 2007
11 3 C 2011
12 3 C 2010
13 2 C 2009
14 1 C 2008
15 3 C 2007
16 4 D 2011
17 3 D 2010
18 2 D 2009
19 1 D 2008
20 4 D 2007
I now need to combine (sum) the values only for region A and D per year and keep the value A for the column regions of these calculated sums. The output should look like this:
count region year
1 5 A 2011
2 5 A 2010
3 3 A 2009
4 6 A 2008
5 8 A 2007
6 2 B 2011
7 2 B 2010
8 1 B 2009
9 5 B 2008
10 3 B 2007
11 3 C 2011
12 3 C 2010
13 2 C 2009
14 1 C 2008
15 3 C 2007
The counts for region B and C should not be changed. I tried but never received the needed output. Does anyone have a tip? I would be very grateful.
We may replace
the D
to A
, and do a group_by
sum
library(dplyr)
df1 %>%
group_by(region = replace(region, region == 'D', 'A'), year) %>%
summarise(count = sum(count), .groups = 'drop')