Search code examples
rsql-serverdbplyr

Count TRUE/FALSE values per row of a SQL table using dbplyr from R


I do have a remote connection to a SQL table using dbplyr. One of the table is composed of one ID column and several other columns storing 0 and 1 values (SQL bit - interpreted as boolean TRUE/FALSE values from R side) and from R I simply want to get the total number of 1 for each row.

It is straightforward in R with an usual table using for instance rowSums() which unfortunately does not work through dbplyr (no SQL equivalent).

For obvious reasons due to the size of the underlying table I do not want to collect() the data.

How one could achieve that in such a context?

library(dplyr)
# Local case
DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1,1,0), col3 = c(1,0,0))
DF %>% 
  summarise(sum = rowSums(select(., -1)))
#   sum
# 1   3
# 2   2
# 3   1

# If DF is a remote SQL table, therefore one would get the following error message:  
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]'rowSums' is not a recognized built-in function name.  [Microsoft][ODBC SQL Server Driver][SQL Server]

EDIT - ADDING MINIMAL REPRODUCIBLE EXAMPLE

Following @Simon.S.A. reply, below a MRE:

# Table creation
DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1, 1,0), col3 = c(1,0,0))
colnames(DF) <- c("col 1", "col 2", "col 3", "col 4")
# SQL simulation
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, DF)
con %>% tbl("DF") # just checking
#preparing formula
cols <- colnames(DF)[-1]
all_equations <- paste0("`", cols, "` =  sum(`", cols,"`)")
# actual query
con %>% 
  tbl("DF") %>% 
  summarise(!!!rlang::parse_exprs(all_equations))
# Error: near "=": syntax error
# %>% show_query() shows a strange query, but I am no SQL expert as you understood.
# also tried: 
# all_equations <- paste(cols ,"=  sum(",cols,")")
# all_equations <- paste0("`[", cols, "]` =  sum(`[", cols,"]`)")

Solution

  • Part of the challenge here is that dbplyr translates dplyr commands into SQL, but translations are only defined for some R commands. As translations exists for the standard dplyr commands, we can use summarise.

    To summarise all in one go, we can do the following:

    library(dplyr)
    library(rlang)
    
    cols = colnames(DF)
    cols = cols[2:length(cols)]
    
    all_equations = paste(cols ,"=  sum(",cols,")")
    
    
    DF %>%
      summarise(!!!parse_exprs(all_equations))
    

    The idea is to build text strings of every sum, and then use !!!parse_exprs(.) to turn this text into R code.

    EDIT - Same approach but for row sums

    # Table creation
    DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1, 1,0), col3 = c(1,0,0))
    colnames(DF) <- c("col 1", "col 2", "col 3", "col 4")
    # SQL simulation
    con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
    copy_to(con, DF)
    con %>% tbl("DF") # just checking
    #preparing formula
    cols <- colnames(DF)[-1]
    eq <- paste0("`",paste0(cols, collapse = "` + `"),"`")
    # actual query
    con %>% 
      tbl("DF") %>% 
      mutate(new = !!parse_expr(eq))
    

    But still depends on dbplyr translation and so may not handle back-ticks correctly.