Matching two date fields using dplyr

So I have a dataframe as such:

ID   key  date1
001   02  2018-02-16
001   02  2018-02-19
001   03  2018-02-17
001   03  2018-02-22
001   04  2017-01-01
002   11  2019-12-21
002   12  2019-12-21
002   13  2019-12-22

And another dataframe (DF2)

ID   key  date2
001   02  2018-02-20
001   03  2018-03-22
002   13  2019-12-22
002   13  2019-12-21

So the task is conceptually straightforward:

I would like to find the previous date that exists in DF1 to each date in DF2.

What do I mean by that? In DF2 for example, we see a date of 2018-02-20 as well as a corresponding Key and ID. So I go to DF1 and find the matching ID and Key, this gives me two possibilities. I need the one that's the previous, so not after. Thus it would be 2018-02-19. I will eventually calcuate the number of days.

The final df should look like this:

ID   key  date2           date1   day_diff
001   02  2018-02-20 2018-02-19          1
001   03  2018-03-22 2018-02-22         28
002   13  2019-12-22 2019-12-22          0
002   13  2019-12-21         NA         NA

Again, we just need the date previous to the one that is in each row of DF2. This also needs to return NA if there is no previous date.

Solution

Does this work:

library(dplyr)
df1 %>% group_by(ID, key) %>% filter(date1 == max(date1)) %>% 
fuzzyjoin::fuzzy_right_join(df2, by = c('ID' = 'ID', 'key' = 'key', 'date1' = 'date2'), match_fun = list(`==`, `==`, `<=`)) %>% 
ungroup() %>% select('ID' = ID.y, 'key' = key.y, date2, date1) %>% mutate(day_diff = as.numeric(date2 - date1))
# A tibble: 4 x 5
  ID    key   date2      date1      day_diff
  <chr> <chr> <date>     <date>        <dbl>
1 001   02    2018-02-20 2018-02-19        1
2 001   03    2018-03-22 2018-02-22       28
3 002   13    2019-12-22 2019-12-22        0
4 002   13    2019-12-21 NA               NA