Search code examples
rloopsdateif-statementmatching

How to pick the nearest date above a certain date in one dataframe from another


So I have two dataframes, "df1" and "df2" (see code below). I want to create a new variable in "df1" by iterating trough the elDate variable in "df2", and picking the closest elDate value above the date value in "df1".

For example, the first row in "df1" is Albania with the date 1996-12-24. I want the value in the new variable to be 1997-06-29, which is the closest elDate in "df2" above date in "df1".

The main thing is to stop at the first elDate which is above date.

library(tidyverse)
library(lubridate)

df1 <- tibble(country = c(rep("Albania", 11), rep("Algeria", 7)),
                    date = ymd("1996-12-24", "1997-01-30", "2009-07-30", "2011-07-08", "2012-04-18", "2012-01-20", "2013-05-16", "2016-03-03", "2017-05-11", "2018-09-07", "2022-05-31", "2005-01-10", "2006-07-12", "2012-10-09", "2012-11-15", "2014-04-18", "2017-06-07", "2017-01-24"))


df2 <- tibble(country = c(rep("Albania", 9), rep("Algeria", 6)),
                          elDate = ymd("2025-04-25", "2021-04-25", "2017-06-25", "2013-06-25", "2009-06-28", "2005-07-03", "2001-07-08", "1997-06-29", "1996-05-26", "2021-06-12", "2017-05-04", "2012-05-10", "2007-05-17", "2007-05-30", "2002-05-30"))

I want the new dataframe to look like this:

country date elDate
Albania 24-12-1996 29-06-1997
Albania 30-01-1997 29-06-1997
Albania 30-07-2009 25-06-2013
Albania 08-07-2011 25-06-2013
Albania 18-04-2012 25-06-2013
Albania 20-01-2012 25-06-2013
Albania 16-05-2013 25-06-2013
Albania 03-03-2016 25-06-2017
Albania 11-05-2017 25-06-2017
Albania 07-09-2018 25-04-2021
Albania 31-05-2022 24-04-2025
Algeria 10-01-2005 30-05-2007
Algeria 12-07-2006 30-05-2007
Algeria 09-10-2012 04-05-2017
Algeria 15-11-2012 04-05-2017
Algeria 18-04-2014 04-05-2017
Algeria 07-06-2017 12-06-2021
Algeria 24-01-2017 12-06-2021
Algeria 07-06-2017 12-06-2021

Suggestions are much appreciated!

Thanks!


Solution

  • You can try inequality joins in dplyr:

    library(dplyr) #1.1.0 and above
    
    df1 %>% 
      left_join(df2, by = join_by(country, closest(date <= elDate)))
    
    # A tibble: 18 × 3
       country date       elDate    
       <chr>   <date>     <date>    
     1 Albania 1996-12-24 1997-06-29
     2 Albania 1997-01-30 1997-06-29
     3 Albania 2009-07-30 2013-06-25
     4 Albania 2011-07-08 2013-06-25
     5 Albania 2012-04-18 2013-06-25
     6 Albania 2012-01-20 2013-06-25
     7 Albania 2013-05-16 2013-06-25
     8 Albania 2016-03-03 2017-06-25
     9 Albania 2017-05-11 2017-06-25
    10 Albania 2018-09-07 2021-04-25
    11 Albania 2022-05-31 2025-04-25
    12 Algeria 2005-01-10 2007-05-17
    13 Algeria 2006-07-12 2007-05-17
    14 Algeria 2012-10-09 2017-05-04
    15 Algeria 2012-11-15 2017-05-04
    16 Algeria 2014-04-18 2017-05-04
    17 Algeria 2017-06-07 2021-06-12
    18 Algeria 2017-01-24 2017-05-04