Search code examples
rdbplyr

'no applicable method' for applying dbplyr's sql_render to a data.frame


I was testing an example from RStudio about "render SQL code" from dbplyr:

library(nycflights13)
ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))

dbplyr::sql_render(ranked)

But when run, it returns the following error message:

Error in UseMethod("sql_render") : no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

Can someone explain why?


Solution

  • When you are working on a "normal" data.frame, then it returns a frame, in which case sql_render is inappropriate (and will be very confused). If we work with just your initial code, then we can see that SQL has nothing to do with it:

    library(dplyr)
    library(nycflights)
    ranked <- flights %>%
      group_by(year, month, day) %>%
      select(dep_delay) %>% 
      mutate(rank = rank(desc(dep_delay)))
    ranked
    # # A tibble: 336,776 x 5
    # # Groups:   year, month, day [365]
    #     year month   day dep_delay  rank
    #    <int> <int> <int>     <dbl> <dbl>
    #  1  2013     1     1         2  313 
    #  2  2013     1     1         4  276 
    #  3  2013     1     1         2  313 
    #  4  2013     1     1        -1  440 
    #  5  2013     1     1        -6  742 
    #  6  2013     1     1        -4  633 
    #  7  2013     1     1        -5  691 
    #  8  2013     1     1        -3  570 
    #  9  2013     1     1        -3  570 
    # 10  2013     1     1        -2  502.
    # # ... with 336,766 more rows
    

    But dbplyr won't be able to do something with that:

    library(dbplyr)
    sql_render(ranked)
    # Error in UseMethod("sql_render") : 
    #   no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"
    

    If, however, we have that same flights data in a database, then we can do what you are expecting, with some minor changes.

    # pgcon <- DBI::dbConnect(odbc::odbc(), ...)     # to my local postgres instance
    copy_to(pgcon, flights, name = "flights_table")  # go get some coffee
    
    flights_db <- tbl(pgcon, "flights_table")
    ranked_db <- flights_db %>%
      group_by(year, month, day) %>%
      select(dep_delay) %>% 
      mutate(rank = rank(desc(dep_delay)))
    # Adding missing grouping variables: `year`, `month`, `day`
    

    We can see some initial data, showing the top 10 rows of what the query will eventually return:

    ranked_db
    # # Source:   lazy query [?? x 5]
    # # Database: postgres [postgres@localhost:/]
    # # Groups:   year, month, day
    #     year month   day dep_delay    rank
    #    <int> <int> <int>     <dbl> <int64>
    #  1  2013     1     1        NA       1
    #  2  2013     1     1        NA       1
    #  3  2013     1     1        NA       1
    #  4  2013     1     1        NA       1
    #  5  2013     1     1       853       5
    #  6  2013     1     1       379       6
    #  7  2013     1     1       290       7
    #  8  2013     1     1       285       8
    #  9  2013     1     1       260       9
    # 10  2013     1     1       255      10
    # # ... with more rows
    

    and we can see what the real SQL query looks like:

    sql_render(ranked_db)
    # <SQL> SELECT "year", "month", "day", "dep_delay", RANK() OVER (PARTITION BY "year", "month", "day" ORDER BY "dep_delay" DESC) AS "rank"
    # FROM "flights_table"
    

    Realizing that, due to the way dbplyr operates, we don't know how many rows will be returned until we collect it:

    nrow(ranked_db)
    # [1] NA
    
    res <- collect(ranked_db)
    nrow(res)
    # [1] 336776
    
    res
    # # A tibble: 336,776 x 5                 # <--- no longer 'Source:   lazy query [?? x 5]'
    # # Groups:   year, month, day [365]
    #     year month   day dep_delay    rank
    #    <int> <int> <int>     <dbl> <int64>
    #  1  2013     1     1        NA       1
    #  2  2013     1     1        NA       1
    #  3  2013     1     1        NA       1
    #  4  2013     1     1        NA       1
    #  5  2013     1     1       853       5
    #  6  2013     1     1       379       6
    #  7  2013     1     1       290       7
    #  8  2013     1     1       285       8
    #  9  2013     1     1       260       9
    # 10  2013     1     1       255      10
    # # ... with 336,766 more rows