R: Import values from one dataframe to another by matching dates

Assume I have df1:

Start_Date    End_Date     Value
2001-01-01    2001-12-31   1
2002-01-01    2002-12-31   2
2003-01-01    2003-12-31   3
2004-01-01    2004-12-31   4
2005-01-01    2005-12-31   5

& df2:

DateTime      Gain   People
2003-01-01    3      3
2003-05-09    5      4
2004-12-31    1      2
2005-01-31    -2     2
2005-08-13    9      7
2006-09-10    6      8
2007-10-03    7      5

What I would like to do is to import Value from df1 into df2 by checking which DateTime in df2 falls in between Start_Date & End_Date in df1. If dates do not fall into any of periods in df1, return Value 0. A view of desired outcome:

DateTime      Gain   People   Value
2003-01-01    3      3        3
2003-05-09    5      4        3
2004-12-31    1      2        4
2005-01-31    -2     2        5
2005-08-13    9      7        5
2006-09-10    6      8        0
2007-10-03    7      5        0

Please advise

Solution

The underlying challenge is joining on inequality conditions. These are straightforward in sql, but not in R as dplyr only joins on equality. Here is the a link to the primary answer on S.O.

The best way I have found is to do a more general join and then filter for the inequalities.

df1 = df1 %>% mutate(join_id = 1)
df2 = df2 %>% mutate(join_id = 1)

output = left_join(df2, df1, by = 'join_id') %>%
  filter(Start_Date <= DateTime,
         DateTime <= End_Date) %>%
  select(DateTime, Gain, People, Value) %>%
  mutate(Value = ifelse(is.na(Value), 0, Value)

Explanation:

As your dataframes do not have an existing ID column we first create one. This is unnecessary if you want to join on some equality constraints and some inequality constraints.
We use a (much) more general join followed by a filter for the inequality constraints.
ifelse(is.na(... is used to replace the missing values.

In some cases this much more general join can create performance issues. But as R uses lazy-evaluation by default, if you filter immediately after the join then R should run both statements as part of the same join and avoid any performance problem.