I am trying add information from a second data set to my first based on ID and dates. If the ID matches and 'Date' is between 'start' and 'end', I want to add the value for colour to df1.
df1
ID Date
1 3/31/2017
2 2/11/2016
2 4/10/2016
3 5/15/2015
df2
ID start end colour
1 1/1/2000 3/31/2011 blue
1 4/1/2011 6/4/2012 purple
1 6/5/2012 3/31/2017 blue
2 5/1/2014 3/31/2017 red
3 1/12/2012 2/12/2014 purple
To get a result like this:
dat
ID Date colour
1 3/31/2017 blue
2 2/11/2016 red
2 4/10/2016 red
3 5/15/2015 NA
Which can be created with the code here:
library(lubridate)
df1 <- tibble(ID = c(1,2,2,3), Date = mdy(c("3/31/2017","2/11/2016","4/10/2016","5/15/2015")))
df2 <- tibble(ID = c(1,1,1,2,3), start = mdy(c("1/1/2000","4/1/2011","6/5/2012","5/1/2014","1/12/2012")), end = mdy(c("3/31/2011","6/4/2012","3/31/2017","3/31/2017","2/12/2014")), colour = c("blue", "purple", "blue", "red", "purple"))
I used a response from a similar question, Checking if Date is Between two Dates in R and used the code below:
library(dplyr)
dat <- inner_join(df1, df2, by = "ID")
dat %>% rowwise() %>%
mutate(match = ifelse(between(df1$Date, df2$start, df2$end), 1 , 0))%>%
select(-c(df2$start, df2$end))%>%
arrange(df1$Date, desc(match))%>%
distinct(df1$Date)
and I get the following error:
Error in between(df1$Date, df2$start, df2$end) : Expecting a single value: [extent=355368].
help?
Thanks so much!
Update-
Thanks so much everyone for your answers.
I tried them all but all the final datasets have different number of rows than the first dataset. I am not sure what is happening. The data I have posted is made-up to resemble the data I am working with. Are there additional details that I should let you know? I don't know where to start...
dplyr
uses non standard evaluation and so you can dump all the dataframe names and $
s and your code begins basically in the right direction. There are additionally a number of implicit transformations necessary for you to end up with the data frame you specified, but this below will get you there.
dat <-
df1 %>%
inner_join(df2) %>%
rowwise %>%
mutate(match = ifelse(between(Date, start, end), 1 , NA)) %>%
arrange(ID, Date, desc(match)) %>%
ungroup %>%
group_by(ID, Date) %>%
mutate(best = row_number(ID),
colour = if_else(is.na(match), NA_character_, colour)) %>%
filter(best == 1) %>%
select(ID, Date, colour)
> dat # A tibble: 4 x 3 # Groups: ID, Date [4] ID Date colour <dbl> <date> <chr> 1 1 2017-03-31 blue 2 2 2016-02-11 red 3 2 2016-04-10 red 4 3 2015-05-15 <NA>