I do have a remote connection to a SQL table using dbplyr.
One of the table is composed of one ID column and several other columns storing 0 and 1 values (SQL bit
- interpreted as boolean TRUE/FALSE values from R side) and from R I simply want to get the total number of 1 for each row.
It is straightforward in R with an usual table using for instance rowSums()
which unfortunately does not work through dbplyr (no SQL equivalent).
For obvious reasons due to the size of the underlying table I do not want to collect()
the data.
How one could achieve that in such a context?
library(dplyr)
# Local case
DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1,1,0), col3 = c(1,0,0))
DF %>%
summarise(sum = rowSums(select(., -1)))
# sum
# 1 3
# 2 2
# 3 1
# If DF is a remote SQL table, therefore one would get the following error message:
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]'rowSums' is not a recognized built-in function name. [Microsoft][ODBC SQL Server Driver][SQL Server]
EDIT - ADDING MINIMAL REPRODUCIBLE EXAMPLE
Following @Simon.S.A. reply, below a MRE:
# Table creation
DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1, 1,0), col3 = c(1,0,0))
colnames(DF) <- c("col 1", "col 2", "col 3", "col 4")
# SQL simulation
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, DF)
con %>% tbl("DF") # just checking
#preparing formula
cols <- colnames(DF)[-1]
all_equations <- paste0("`", cols, "` = sum(`", cols,"`)")
# actual query
con %>%
tbl("DF") %>%
summarise(!!!rlang::parse_exprs(all_equations))
# Error: near "=": syntax error
# %>% show_query() shows a strange query, but I am no SQL expert as you understood.
# also tried:
# all_equations <- paste(cols ,"= sum(",cols,")")
# all_equations <- paste0("`[", cols, "]` = sum(`[", cols,"]`)")
Part of the challenge here is that dbplyr translates dplyr commands into SQL, but translations are only defined for some R commands. As translations exists for the standard dplyr commands, we can use summarise
.
To summarise all in one go, we can do the following:
library(dplyr)
library(rlang)
cols = colnames(DF)
cols = cols[2:length(cols)]
all_equations = paste(cols ,"= sum(",cols,")")
DF %>%
summarise(!!!parse_exprs(all_equations))
The idea is to build text strings of every sum, and then use !!!parse_exprs(.)
to turn this text into R code.
EDIT - Same approach but for row sums
# Table creation
DF <- tibble(ID = LETTERS[1:3], col1 = c(1,1,1), col2 = c(1, 1,0), col3 = c(1,0,0))
colnames(DF) <- c("col 1", "col 2", "col 3", "col 4")
# SQL simulation
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, DF)
con %>% tbl("DF") # just checking
#preparing formula
cols <- colnames(DF)[-1]
eq <- paste0("`",paste0(cols, collapse = "` + `"),"`")
# actual query
con %>%
tbl("DF") %>%
mutate(new = !!parse_expr(eq))
But still depends on dbplyr translation and so may not handle back-ticks correctly.