Search code examples
rdplyrlubridate

get max maximum value of last 3 days if date matches from another dataframe in R?


i have two dataframe

df1= data.frame( ts = c('2020-01-15', '2020-01-16' , '2020-01-17', '2021-01-14', '2021-01-15','2021- 
                         01-16','2021-01-24','2021-01-25','2021-01-26'),
                aa_h=c(1,2,3,6,4,5,7,9,8),
                bh= c(12,13,14,11,11,11,122,12,56))

df2_mx=data.frame( ts = c('2020-01-17', '2021-01-16' , '2021-01-26'),
                aa= NA)

Now here i want to compare the dates of df2_mx from df1, and if matches, I want the max value of aa_h of the last two and the current day and insert it in "aa" column of df2_mx

Example

1st row i.e. '2020-01-17' of df2_mx would match the 3rd row of df1 and it would look up 2 days above and get the value which_max(c(1,2,3))--> 3 and insert it "aa" column of df2_mx .

Expected Output:

  df2_mx=data.frame( ts = c('2020-01-17', '2021-01-16' , '2021-01-26'),
                      aa= c(3,6,9))

Tryout Code n=1

for (i in 1:nrow(df1)){

 ifelse(which(as.Date(df1[i,1])==as.Date(df2_mx[,1])), 
oh_df_mx[which(as.Date(df1[i,1])==as.Date(df2_mx
[,1])),n+1]<-which.max(df1[(i-2):i,3]),invisible())
}

Solution

  • An option with fuzzyjoin package.

    library(dplyr)
    
    df1 %>%
      mutate(ts = as.Date(ts)) %>%
      fuzzyjoin::fuzzy_right_join(df2_mx %>%
                                   mutate(ts = as.Date(ts), ts_2_day = ts - 2), 
                                 by = c('ts', 'ts' = 'ts_2_day'), 
                                 match_fun = c(`<=`, `>=`)) %>%
      group_by(ts = ts.y) %>%
      summarise(aa_h = max(aa_h, na.rm = TRUE))
    
    #   ts          aa_h
    #  <date>     <dbl>
    #1 2020-01-17     3
    #2 2021-01-16     6
    #3 2021-01-26     9