Search code examples
rdplyrrow

How best to do row operations in R


Below is the sample data

   year <- c (2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021)
  indcode <- c(71,71,71,71,71,71,72,72,72,72,72,72,44,44,44,44,44,44,45,45,45,45,45,45)
  avgemp <- c(44,46,48,50,55,56,10,11,12,13,14,15,21,22,22,23,25,25,61,62,62,63,69,77)
  ownership <-c(50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)

  test3 <- data.frame (year,indcode,avgemp,ownership)

The desired result is to have where it sums the avgemp for two specific combinations (71+72 and 44+45) and produces one additional row per year. The items in the parentheses (below) are just there to illustrate which numbers get added. The primary source of my confusion is how to have it select and therefore add certain indcode combinations. My initial thought is that you would pivot wider, add the columns, and the pivot_longer but hoping for something a bit less convoluted.

  year     indcode        avgemp          ownership
  2016       71+72         54 (44+10)         50
  2016         71          44                 50
  2016         72          10
  2017       71+72         57 (46+11)         50
  2018       71+72         60 (48+12)         50
  2019       71+72         63 (50+13)         50
  2020       71+72         69 (55+14)         50
  2021       71+72         71 (56+15)         50

I know that it would start something like this

 test3 <- test3 %>% group_by (indcode) %>% mutate("71+72" = (something that filters out 71 and  72) 
  

Solution

  • Using data.table - convert the data.frame to 'data.table' with setDT, grouped by 'year', 'ownership', and the 'indcode' created by an ifelse/fcase method), get the sum of 'avgemp' as a summarised output

    library(data.table)
    setDT(test3)[, .(avgemp = sum(avgemp)), .(year, ownership, 
      indcode = fcase(indcode %in% c(71, 72), '71+72', default = '44+45'))]
    

    -output

       year ownership indcode avgemp
        <num>     <num>  <char>  <num>
     1:  2016        50   71+72     54
     2:  2017        50   71+72     57
     3:  2018        50   71+72     60
     4:  2019        50   71+72     63
     5:  2020        50   71+72     69
     6:  2021        50   71+72     71
     7:  2016        50   44+45     82
     8:  2017        50   44+45     84
     9:  2018        50   44+45     84
    10:  2019        50   44+45     86
    11:  2020        50   44+45     94
    12:  2021        50   44+45    102