Search code examples
sqlrdplyrdata-manipulationdbplyr

How to solve error "no applicable method for 'show_query' applied to an object of class "data.frame""


I am working with the R programming language. I am trying to convert "dplyr/dbplyr" code into SQL code using the "show_query()" option.

For example, I tried to run the following code:

#first code

library(dplyr)
library(dbplyr)

data(iris)

 iris %>% 
 filter(Species == "setosa")  %>% 
  summarise(mean.Sepal.Length = mean(Sepal.Length),
            mean.Petal.Length = mean(Petal.Length))  %>% show_query()

However, this returned the following error (note: when you remove "show_query()", the above code actually runs):

Error in UseMethod("show_query") : 
  no applicable method for 'show_query' applied to an object of class "data.frame"

I think I found a solution to this problem:

#second code

> con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

> flights <- copy_to(con, iris)

> flights %>% 
    filter(Species == "setosa")  %>% 
     summarise(mean.Sepal.Length = mean(Sepal.Length),
              mean.Petal.Length = mean(Petal.Length))  %>% show_query()


<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `iris`
WHERE (`Species` = 'setosa')

Warning message:
Missing values are always removed in SQL.
Use `mean(x, na.rm = TRUE)` to silence this warning
This warning is displayed only once per session. 

Can someone please tell me why the original code I tried did not work, but the second code is working? Why is it necessary to establish a connection and add the "copy_to" statement - even if I want to run something locally? I am just curious to convert DPLYR code into SQL - at this point, I just want to run everything locally, and not connect to a remote database. Thus, why do I need to establish a connection if I want to run this locally? Why does the show_query() statement not work in the original code?


Solution

  • show_query() translates the dplyr syntax into query code for the backend you are using.

    A database backend using dbplyr will result in an SQL query (as a data.table backend using dtplyr will result in a DT[i,j,by] query).

    show_query doesn't need to have a method to translate dplyr syntax applied to a data.frame backend to itself, hence the error message you're getting.

    An easy way to get an SQL query result is to transform the data.frame into an in-memory database with memdb_frame:

    memdb_frame(iris) %>% 
      filter(Species == "setosa")  %>% 
      summarise(mean.Sepal.Length = mean(Sepal.Length),
                mean.Petal.Length = mean(Petal.Length))  %>% show_query()
    
    <SQL>
    SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
    FROM `dbplyr_002`
    WHERE (`Species` = 'setosa')