Search code examples
rdbplyr

How to "arrange" aggregate variable in dbplyr?


The following dbplyr statement fails:

foo <- activity_viewed %>% group_by(pk) %>% summarize(total = n()) %>%
  arrange(-total) %>% head(3) %>% collect()

with this error:

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "total" does not exist
LINE 4: ORDER BY -"total"
                  ^
)

I can see the problem in the query: SQL doesn't allow the ORDER BY to use column aliases.

Here's the generated query:

> print(show_query(foo))
<SQL>
SELECT "pk", COUNT(*) AS "total"
FROM "activity"
GROUP BY "pk"
ORDER BY -"total"
LIMIT 3

I need ORDER BY -COUNT(*).

How do I get dbplyr to execute this query?


Solution

  • dbplyr can translate desc but not -

    library(dplyr) 
    library(dbplyr) 
    mtcars2 <- src_memdb() %>% 
               copy_to(mtcars, name = "mtcars2-cc", overwrite = TRUE)
    mtcars2 %>% arrange(desc(cyl)) %>% show_query()
    <SQL>
    SELECT *
    FROM `mtcars2-cc`
    ORDER BY `cyl` DESC