Search code examples
rdatedplyrlubridate

Create a filter for dates with a condition


I'd like to create a filter data frame operation using dates and a categorical factor (stand). I have raw data in a mydf data frame and a conditional data frame too (mycond). I'd like to create a new data frame (newdf) with the condition if the categorical factor (mydf$stand) is present inside (mycond$stand) then take all the mycond$stand rows in the dates until and before (=<) the (mycond$date) for this categorical factor. If the categorical factor is not in mycond$stand, don't make anything and take all the rows for this factor. In my example:

#Packages
library(dplyr)
library(lubridate) 

# Create my complete data frame
N<-100
ID<-1:N
stand <- rep(c("KBFAS1755G","DDHOF8674C","WJFZM8772L","EOFDS2812H","VMZWG2258I"),20)
variable <- rnorm(N)
mydf <- data.frame(ID=ID,stand=stand,variable=variable)
mydf$dates <- sample(seq(as.Date('2019/06/01'), Sys.Date(), by="day"), N)
mydf <- mydf %>%
         mutate(dates = ymd(dates)) 
str(mydf) 

#'data.frame':  100 obs. of  5 variables:
# $ ID      : chr  "1" "2" "3" "4" ...
# $ stand   : chr  "KBFAS1755G" "DDHOF8674C" "WJFZM8772L" "EOFDS2812H" ...
# $ date    : Date, format: NA NA NA NA ...
# $ variable: chr  "-1.07890610087943" "0.290143376807384" "0.395138836710153" 
#"-0.310578696329384" ...
# $ dates   : Date, format: "2021-02-23" "2020-04-23" "2019-04-03" "2020-02-19" ...   

 

# Create my conditional data frame
stand<-c("KBFAS1755G","DDHOF8674C","EOFDS2812H")
mycond <- data.frame(stand=stand)
mycond$dates<-c('2021/05/03','2021/01/01','2021/02/12')
mycond <- mycond %>%
         mutate(dates = ymd(dates)) 
str(mycond)

#'data.frame':  3 obs. of  2 variables:
# $ stand: chr  "KBFAS1755G" "DDHOF8674C" "EOFDS2812H"
# $ dates: Date, format: "2021-05-03" "2021-01-01" "2021-02-12"

# Create a new data frame data before dates filter() or another approach
newdf %>% group_by(stand,dates) %>% dplyr::filter(...)

Here I have a brain frozen, because a need a new data frame (newdf) with the data until before the dates in mycond. Please, any ideas for solving it?


Solution

  • You can join the two datasets by 'stand' and keep rows if dates in mydf is less than equal to dates in mycond.

    library(dplyr)
    
    mydf %>% inner_join(mycond, by = 'stand') %>% filter(dates.x <= dates.y)
    

    In base R -

    subset(merge(mydf, mycond, by = 'stand'), dates.x <= dates.y)
    

    You can avoid the filter step if you use fuzzyjoin where you can join by range.

    fuzzyjoin::fuzzy_inner_join(mydf, mycond, 
                by = c('stand', 'dates'), match_fun = c(`==`, `<=`))