I am working with the R programming language. I am trying to convert "dplyr/dbplyr" code into SQL code using the "show_query()" option.
For example, I tried to run the following code:
#first code
library(dplyr)
library(dbplyr)
data(iris)
iris %>%
filter(Species == "setosa") %>%
summarise(mean.Sepal.Length = mean(Sepal.Length),
mean.Petal.Length = mean(Petal.Length)) %>% show_query()
However, this returned the following error (note: when you remove "show_query()", the above code actually runs):
Error in UseMethod("show_query") :
no applicable method for 'show_query' applied to an object of class "data.frame"
I think I found a solution to this problem:
#second code
> con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
> flights <- copy_to(con, iris)
> flights %>%
filter(Species == "setosa") %>%
summarise(mean.Sepal.Length = mean(Sepal.Length),
mean.Petal.Length = mean(Petal.Length)) %>% show_query()
<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `iris`
WHERE (`Species` = 'setosa')
Warning message:
Missing values are always removed in SQL.
Use `mean(x, na.rm = TRUE)` to silence this warning
This warning is displayed only once per session.
Can someone please tell me why the original code I tried did not work, but the second code is working? Why is it necessary to establish a connection and add the "copy_to" statement - even if I want to run something locally? I am just curious to convert DPLYR code into SQL - at this point, I just want to run everything locally, and not connect to a remote database. Thus, why do I need to establish a connection if I want to run this locally? Why does the show_query() statement not work in the original code?
show_query()
translates the dplyr
syntax into query code for the backend you are using.
A database
backend using dbplyr
will result in an SQL
query (as a data.table
backend using dtplyr will result in a DT[i,j,by]
query).
show_query
doesn't need to have a method to translate dplyr
syntax applied to a data.frame
backend to itself, hence the error message you're getting.
An easy way to get an SQL
query result is to transform the data.frame
into an in-memory database with memdb_frame
:
memdb_frame(iris) %>%
filter(Species == "setosa") %>%
summarise(mean.Sepal.Length = mean(Sepal.Length),
mean.Petal.Length = mean(Petal.Length)) %>% show_query()
<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `dbplyr_002`
WHERE (`Species` = 'setosa')