I'd like to create a filter data frame operation using dates and a categorical factor (stand
). I have raw data in a mydf
data frame and a conditional data frame too (mycond
). I'd like to create a new data frame (newdf
) with the condition if the categorical factor (mydf$stand
) is present inside (mycond$stand
) then take all the mycond$stand
rows in the dates until and before (=<
) the (mycond$date
) for this categorical factor. If the categorical factor is not in mycond$stand
, don't make anything and take all the rows for this factor. In my example:
#Packages
library(dplyr)
library(lubridate)
# Create my complete data frame
N<-100
ID<-1:N
stand <- rep(c("KBFAS1755G","DDHOF8674C","WJFZM8772L","EOFDS2812H","VMZWG2258I"),20)
variable <- rnorm(N)
mydf <- data.frame(ID=ID,stand=stand,variable=variable)
mydf$dates <- sample(seq(as.Date('2019/06/01'), Sys.Date(), by="day"), N)
mydf <- mydf %>%
mutate(dates = ymd(dates))
str(mydf)
#'data.frame': 100 obs. of 5 variables:
# $ ID : chr "1" "2" "3" "4" ...
# $ stand : chr "KBFAS1755G" "DDHOF8674C" "WJFZM8772L" "EOFDS2812H" ...
# $ date : Date, format: NA NA NA NA ...
# $ variable: chr "-1.07890610087943" "0.290143376807384" "0.395138836710153"
#"-0.310578696329384" ...
# $ dates : Date, format: "2021-02-23" "2020-04-23" "2019-04-03" "2020-02-19" ...
# Create my conditional data frame
stand<-c("KBFAS1755G","DDHOF8674C","EOFDS2812H")
mycond <- data.frame(stand=stand)
mycond$dates<-c('2021/05/03','2021/01/01','2021/02/12')
mycond <- mycond %>%
mutate(dates = ymd(dates))
str(mycond)
#'data.frame': 3 obs. of 2 variables:
# $ stand: chr "KBFAS1755G" "DDHOF8674C" "EOFDS2812H"
# $ dates: Date, format: "2021-05-03" "2021-01-01" "2021-02-12"
# Create a new data frame data before dates filter() or another approach
newdf %>% group_by(stand,dates) %>% dplyr::filter(...)
Here I have a brain frozen, because a need a new data frame (newdf
) with the data until before the dates
in mycond
. Please, any ideas for solving it?
You can join the two datasets by 'stand'
and keep rows if dates in mydf
is less than equal to dates in mycond
.
library(dplyr)
mydf %>% inner_join(mycond, by = 'stand') %>% filter(dates.x <= dates.y)
In base R -
subset(merge(mydf, mycond, by = 'stand'), dates.x <= dates.y)
You can avoid the filter
step if you use fuzzyjoin
where you can join by range.
fuzzyjoin::fuzzy_inner_join(mydf, mycond,
by = c('stand', 'dates'), match_fun = c(`==`, `<=`))