Search code examples
rdplyrdbplyr

Handle dates with dbplyr using pure R


dbplyr translates dplyr and base R commands into SQL so that developers can write R code and have it executed in a database (tidyverse reference). When working with dates in R, it is common to use the lubridate package. However, dbplyr translations for lubridate functions do not exist at present. So developers using dbplyr need to find alternative ways of working with dates.

My previous approach has been to use fragments of SQL syntax in my dplyr commands (see for example answers: here and here). However, this requires the developer to know (or find out) the corresponding SQL commands, when part of the point of dbplyr is that it translates to SQL for you.

This makes me ask: What is the best way to manipulate dates using dbplyr translation only when connecting to a remote database?

The ideal solution will:

  • Use only dbplyr translation, so functions without a dbplyr translation cannot be used.
  • Use pure R, no SQL fragments.
  • Run on the database, so a remote table instead of a local table.

I think at a minimum we should be able to:

  • extract year, month, and day
  • combine year, month, and day into a new date

From these you can manually do other operations such as:

  • Increment a date
  • Find the difference between two dates
  • Find the date of the end of the month

But faster/elegant ways to do these more advance operations would be preferable.


Solution

  • One answer is that much of this is already possible. (See answer here.)

    To the extent that desired functions are missing from dbplyr, one idea would be to write pull requests that add more translations of lubridate functions to the backends in dbplyr.

    It seems that translations are inevitably backend-specific. If you look at the PostgreSQL backend here, you can see that some lubridate functions (e.g., month or quarter) are given translations there, but others (e.g., ymd) are not.