Below is the sample data
year <- c (2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021)
indcode <- c(71,71,71,71,71,71,72,72,72,72,72,72,44,44,44,44,44,44,45,45,45,45,45,45)
avgemp <- c(44,46,48,50,55,56,10,11,12,13,14,15,21,22,22,23,25,25,61,62,62,63,69,77)
ownership <-c(50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
test3 <- data.frame (year,indcode,avgemp,ownership)
The desired result is to have where it sums the avgemp for two specific combinations (71+72 and 44+45) and produces one additional row per year. The items in the parentheses (below) are just there to illustrate which numbers get added. The primary source of my confusion is how to have it select and therefore add certain indcode combinations. My initial thought is that you would pivot wider, add the columns, and the pivot_longer but hoping for something a bit less convoluted.
year indcode avgemp ownership
2016 71+72 54 (44+10) 50
2016 71 44 50
2016 72 10
2017 71+72 57 (46+11) 50
2018 71+72 60 (48+12) 50
2019 71+72 63 (50+13) 50
2020 71+72 69 (55+14) 50
2021 71+72 71 (56+15) 50
I know that it would start something like this
test3 <- test3 %>% group_by (indcode) %>% mutate("71+72" = (something that filters out 71 and 72)
Using data.table
- convert the data.frame to 'data.table' with setDT
, grouped by 'year', 'ownership', and the 'indcode' created by an ifelse/fcase method), get the sum
of 'avgemp' as a summarised output
library(data.table)
setDT(test3)[, .(avgemp = sum(avgemp)), .(year, ownership,
indcode = fcase(indcode %in% c(71, 72), '71+72', default = '44+45'))]
-output
year ownership indcode avgemp
<num> <num> <char> <num>
1: 2016 50 71+72 54
2: 2017 50 71+72 57
3: 2018 50 71+72 60
4: 2019 50 71+72 63
5: 2020 50 71+72 69
6: 2021 50 71+72 71
7: 2016 50 44+45 82
8: 2017 50 44+45 84
9: 2018 50 44+45 84
10: 2019 50 44+45 86
11: 2020 50 44+45 94
12: 2021 50 44+45 102