Search code examples
rdplyrconcatenationpastesummarize

Pass a concatenated string as column name in dplyr::summarise


I am trying perform dplyr summarize iteratively using concatenated string as column names

Category=c("a","a","b","b","b","c","c","c")
A1=c(1,2,3,4,3,2,1,2)
A2=c(10,11,12,13,14,15,16,17)
tt=cbind(Category,A1,A2)
tdat=data.frame(tt)
colnames(tdat)=c("Category","M1","M2")
ll=matrix(1:2,nrow=2)
for(i in 1:nrow(ll)) {
  Aone=tdat %>% group_by(Category) %>%
    summarize(Msum=sum(paste("M",i,sep="")))
}

I end up the following error

x invalid 'type' (character) of argument
ℹ Input Msum is sum(paste("M", i, sep = "")).
ℹ The error occurred in group 1: Category = "A".
Run rlang::last_error() to see where the error occurred.```


The goal is to iteratively get arithmentic functions within summarize function in dplyr. But this concatenated string is not recognized as column name. 

Solution

  • If we want to pass a string as column name, then convert to symbol and evaluate (!!)

    library(dplyr)
    Aone <- vector('list', nrow(ll))
    for(i in seq_len(nrow(ll))) {
          Aone[[i]] <- tdat %>%
                        group_by(Category) %>%
                        summarize(Msum = sum(!! rlang::sym(paste("M", i, sep=""))))
        }
    

    Or assuming the column name is 'M-1', 'M-2', etc, it should work as well

    Aone <- vector('list', 2)
    for(i in seq_along(Aone)) {
       Aone[[i]] <- tdat %>%
            group_by(Category) %>% 
           summarise(Msum = sum(!! rlang::sym(paste("M-", i, sep=""))), 
             .groups = 'drop')
      }
    

    NOTE: The ll was not clear in the original post. Here, we create a list with length equal to the number of 'M-' columns and assign the output back to the list element by looping over the sequence of that list

    data

    tdat <- data.frame(Category, M1, M2)
    
    
    tdat <- structure(list(Category = c("A", "A", "A", "A", "B", "B", "B", 
    "B"), `M-1` = c(1, 2, 3, 4, 3, 2, 1, 2), `M-2` = c(10, 11, 12, 
    13, 14, 15, 16, 17)), class = "data.frame", row.names = c(NA, 
    -8L))