Search code examples
rjoindplyrdbplyr

How do you do rolling joins with database tables in R?


The dev version of dplyr allows for rolling joins.

# devtools::install_github("/tidyverse/dplyr")
library(dplyr)

a <- tibble(
  person = rep("a", 10),
  time = 1:10
) 

b <- tibble(
  person = rep("a", 3),
  time = c(1, 8, 5),
  pref = c("good", "terrible", "not so good")
)

left_join(a, b, by = join_by(person, closest(time >= time)))
#> # A tibble: 10 × 4
#>    person time.x time.y pref       
#>    <chr>   <int>  <dbl> <chr>      
#>  1 a           1      1 good       
#>  2 a           2      1 good       
#>  3 a           3      1 good       
#>  4 a           4      1 good       
#>  5 a           5      5 not so good
#>  6 a           6      5 not so good
#>  7 a           7      5 not so good
#>  8 a           8      8 terrible   
#>  9 a           9      8 terrible   
#> 10 a          10      8 terrible

However, the same approach does not (at least not yet) work with dbplyr.

library(dbplyr)
library(dplyr)

a <- memdb_frame(
  person = rep("a", 10),
  time = 1:10
) 

b <- memdb_frame(
  person = rep("a", 3),
  time = c(1, 8, 5),
  pref = c("good", "terrible", "not so good")
)

left_join(a, b, by = join_by(person, closest(time >= time)))
#> Error in `dplyr::common_by()`:
#> ! `by` must be a (named) character vector, list, or NULL for natural
#>   joins (not recommended in production code), not a <dplyr_join_by> object.`

How would you do this type of rolling join with database tables in R?


Solution

  • One option is to make use of sql_on argument in the left_join

    library(dbplyr)
    library(dplyr)
    left_join(a, b, sql_on =  "LHS.person = RHS.person and LHS.time >= RHS.time") %>% 
       select(-person.y) %>% 
       group_by(person.x, time.x) %>%
       slice_max(time.y) %>%
       ungroup
    

    -output

    # Source:   SQL [10 x 4]
    # Database: sqlite 3.39.2 [:memory:]
       person.x time.x time.y pref       
       <chr>     <int>  <dbl> <chr>      
     1 a             1      1 good       
     2 a             2      1 good       
     3 a             3      1 good       
     4 a             4      1 good       
     5 a             5      5 not so good
     6 a             6      5 not so good
     7 a             7      5 not so good
     8 a             8      8 terrible   
     9 a             9      8 terrible   
    10 a            10      8 terrible