Search code examples
rdplyrlapplysparklyrnse

Using dplyr's enquo to access Spark table columns via sparklyr


I would like to be ebale to use dplyr's enquo within lapply call while jumping through Spark table columns.

lapply(tbl_vars(sprkTbl),
       function(col_nme) {
           print(col_nme)
           # Enquoe column name
           quo_col_nme <- enquo(col_nme)
           print(quo_col_nme)

           sprkTbl %>%
               select(!!quo_col_nme) %>% 
               # do stuff
               collect -> dta_res
       }) -> l_res

However, when I try to run this code I keep on getting error:

Error in (function (x, strict = TRUE) : the argument has already been evaluated

I've isolated the error to enquo:

>> lapply(tbl_vars(sprkTbl),
...        function(col_nme) {
...            print(col_nme)
...            # Enquoe column name
...            quo_col_nme <- enquo(col_nme)
...            # print(quo_col_nme)
...            
...            # sprkTbl%>%
...            #     select(!!quo_col_nme) %>% 
...            #     # do stuff
...            #     collect -> dta_res
...        }) -> l_res
[1] "first_column_in_spark"

(and then the same error)

Error in (function (x, strict = TRUE) : the argument has already been evaluated

I want to understand why enquo can't be used like that. tbl_vars returns an ordinary character vector, shouldn't the col_name be a string? I would envisage for the syntax to work in the same manner as in:

mtcars %>% select(!!enquote("am")) %>% head(2)
              am
Mazda RX4      1
Mazda RX4 Wag  1

but, clearly this is not the case, when called from within lapply.


Edit

leaving the sparklyr aspect on side, a better and more reproducible example can be provided:

lapply(names(mtcars),function(x) {
    col_enq <- enquo(x)
    mtcars %>% 
        select(!!col_enq) %>% 
        head(2)
})

produces identical error.

Desired results

The old _-based syntax works

lapply(names(mtcars),function(x) {
    # col_enq <- enquo(x)
    mtcars %>% 
        select_(x) %>% 
        head(2)
})

In a word, I want to achieve the same functionality by jumping to Spark table columns and I would prefer not use deprecated select_.


Solution

  • Do I understand your question correctly that you are interested in this result? Or are you bound to use enquo instead of quo?

    library(dplyr)
    
    lapply(names(mtcars),function(x) {
      col_enq <- quo(x)
      mtcars %>% 
        select(!!col_enq) %>% 
        head(2)
    })
    #> [[1]]
    #>               mpg
    #> Mazda RX4      21
    #> Mazda RX4 Wag  21
    #> 
    #> [[2]]
    #>               cyl
    #> Mazda RX4       6
    #> Mazda RX4 Wag   6
    #> 
    #> [[3]]
    #>               disp
    #> Mazda RX4      160
    #> Mazda RX4 Wag  160
    #> 
    #> [[4]]
    #>                hp
    #> Mazda RX4     110
    #> Mazda RX4 Wag 110
    #> 
    #> [[5]]
    #>               drat
    #> Mazda RX4      3.9
    #> Mazda RX4 Wag  3.9
    #> 
    #> [[6]]
    #>                  wt
    #> Mazda RX4     2.620
    #> Mazda RX4 Wag 2.875
    #> 
    #> [[7]]
    #>                qsec
    #> Mazda RX4     16.46
    #> Mazda RX4 Wag 17.02
    #> 
    #> [[8]]
    #>               vs
    #> Mazda RX4      0
    #> Mazda RX4 Wag  0
    #> 
    #> [[9]]
    #>               am
    #> Mazda RX4      1
    #> Mazda RX4 Wag  1
    #> 
    #> [[10]]
    #>               gear
    #> Mazda RX4        4
    #> Mazda RX4 Wag    4
    #> 
    #> [[11]]
    #>               carb
    #> Mazda RX4        4
    #> Mazda RX4 Wag    4