Search code examples
rdataframesummultiple-conditions

How to sum values with multiple conditions per year in R


I have count data from different regions per year. The original data is structured like this:

   count region year
1      1      A 2011
2      2      A 2010
3      1      A 2009
4      5      A 2008
5      4      A 2007
6      2      B 2011
7      2      B 2010
8      1      B 2009
9      5      B 2008
10     3      B 2007
11     3      C 2011
12     3      C 2010
13     2      C 2009
14     1      C 2008
15     3      C 2007
16     4      D 2011
17     3      D 2010
18     2      D 2009
19     1      D 2008
20     4      D 2007

I now need to combine (sum) the values only for region A and D per year and keep the value A for the column regions of these calculated sums. The output should look like this:

   count region year
1      5      A 2011
2      5      A 2010
3      3      A 2009
4      6      A 2008
5      8      A 2007
6      2      B 2011
7      2      B 2010
8      1      B 2009
9      5      B 2008
10     3      B 2007
11     3      C 2011
12     3      C 2010
13     2      C 2009
14     1      C 2008
15     3      C 2007

The counts for region B and C should not be changed. I tried but never received the needed output. Does anyone have a tip? I would be very grateful.


Solution

  • We may replace the D to A, and do a group_by sum

    library(dplyr)
    df1 %>% 
      group_by(region = replace(region, region == 'D', 'A'), year) %>% 
      summarise(count = sum(count), .groups = 'drop')