Search code examples
rdataframedplyrrowsum

Rowsums conditional on column name in a loop


This is a follow-up question from this one: Rowsums conditional on column name

My data frame is called wiod and looks like this:

VAR1 VAR2 AUS1 ... AUS56 BEL1 ... BEL56 NLD1 ... NLD56
A    D    23   ... 99    0    ... 444   123  ... 675
B    D    55   ... 6456  0    ... 557   567  ... 4345

I'd like to calculate the row-sums for the variables AUS, BEL, NLD and then drop the old variables. Like this:

wiot <- wiot %>% 
  mutate(AUS = rowSums(.[grep("AUS", names(.))])) %>% 
  mutate(BEL = rowSums(.[grep("BEL", names(.))])) %>% 
  mutate(NLD = rowSums(.[grep("NLD", names(.))])) %>% 
  select(Var1, Var2, AUS, BEL, NLD)

Of course, there is a large number of the variable groups, not just these three (43, to be precise). Is there any convenient way to do this without using 43 mutate commands?


Solution

  • It makes it easier to convert from wide format to long (gather), then summarise, and if needed convert back to wide (spread) format:

    library(dplyr)
    library(tidyr)
    
    # dataframe from @989 http://stackoverflow.com/a/43519062
    df1 %>% 
      gather(key = myKey, value = myValue, -c(VAR1, VAR2)) %>% 
      mutate(myGroup = gsub("\\d", "", myKey)) %>% 
      group_by(VAR1, VAR2, myGroup) %>% 
      summarise(mySum = sum(myValue)) %>% 
      spread(key = myGroup, value = mySum)
    
    # Source: local data frame [2 x 5]
    # Groups: VAR1, VAR2 [2]
    # 
    #     VAR1   VAR2   AUS   BEL   NLD
    # * <fctr> <fctr> <int> <int> <int>
    # 1      A      D   122   444   798
    # 2      B      D  6511   557  4912