Search code examples
rdplyrdbplyr

How can I dynamically build a string and pass it to dplyr's mutate() function in R?


I asked a similar question before (Link). The given answer works fine. However, it turns out, that it does not fully apply to my use case.

Please consider the following minimal working example:

library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)
library(stringr)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

copy_to(con, mtcars, "mtcars", temporary = FALSE)

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb) %>% 
    mutate(Q1=1, Q2=2, Q3=3, Q4=4) %>% 
    collect()

I am interested in dynamically building the string Q1=1, Q2=2, Q3=3, Q4=4 such that it could be Q1=1, Q2=2, ..., Qn = n.

One idea I had is to build the string like that:

n_par <- 4
str_c('Q', 1:n_par, ' = ', 1:n_par, collapse = ', ')

such that n_par could be any positive number. However, due to dplyr's non-standard evaluation, I cannot make it work like that. However, this is exactly what I need.

Can somebody help?


Solution

  • Generating and evaluating the string

    Q1 = 1, Q2 = 2, Q3 = 3, Q4 = 4 is not a string in the same way that "Q1 = 1, Q2 = 2, Q3 = 3, Q4 = 4" is a string. There are some R functions that will take a string object and evaluate it as code. For example:

    > eval(parse(text="print('hello world')"))
    
    #> [1] "hello world"
    

    However, this may not play nicely inside dbplyr translation. If you manage to get something like this approach working it would be good to see it posted as an answer.

    Using a loop

    Instead of doing it as a single string, an alternative is to use a loop:

    db <- tbl(con, "mtcars") %>%
        select(carb) %>%
        distinct(carb) %>%
        arrange(carb)
    
    for(i in 1:n){
        var = paste0("Q",i)
        db <- db %>%
            mutate(!!sym(var) := i)
    }
    
    db <- collect(db)
    

    The !!sym() is required to tell dplyr that you want the text argument treated as a variable. Lazy evaluation can give you odd results without it. The := assignment is required because the LHS needs to be evaluated.

    This approach is roughly equivalent to one mutate statement for each variable (example below), but the dbplyr translation might not look as elegant as doing it all within a single mutate statement.

    db <- tbl(con, "mtcars") %>%
        select(carb) %>%
        distinct(carb) %>%
        arrange(carb) %>%
        mutate(Q1 = 1) %>%
        mutate(Q2 = 2) %>%
        ...
        mutate(Qn = n) %>%
        collect()