Search code examples
rdateextract

How to get x days after certain condition in R


I have a dataset with multiple dates and conditions. I would like to extract all the rows which start with condition place == "A" and all rows with start date of place == "A" and up to 7 days later. For example:

Date        Place       Value1      Value2
2018-10-27  C           20          8 
2018-10-29  A           10          5
2018-10-31  B           15          6
2018-11-4   C           17          9 
2018-11-8   D           18          5  

And I want:

Date        Place       Value1      Value2
2018-10-29  A           10          5
2018-10-31  B           15          6
2018-11-4   C           17          9 

As you can see it must extract the first row with place == A and all rows within 7 days later. The places after the first day like "A" doesn't make sense, like "B" and "C". It must start with "A". It skips 2018-11-8 because that is more than 7 days from 2018-10-29.

I tried it like this question: R: Extract data based on date, "if date lesser than" , but I don't know how to extract the 7 days.


Solution

  • We can use match to get corresponding Date value and select all the rows within 7 days from it.

    library(dplyr)
    
    df %>%
      mutate(Date = as.Date(Date)) %>%
      filter({tmp <- Date[match('A', Place)] 
              between(Date, tmp, tmp + 7)})
    
    #        Date Place Value Value.1
    #1 2018-10-29     A    10       5
    #2 2018-10-31     B    15       6
    #3 2018-11-04     C    17       9
    

    dplyr allows to perform operations without creating temporary variables in global environment, the above solution can be written in base R as :

    df$Date <- as.Date(df$Date)
    date_val <- df$Date[match('A', df$Place)]
    subset(df, Date >= date_val & Date <= date_val + 7)
    

    data

    df <- structure(list(Date = structure(c(17831, 17833, 17835, 17839, 
    17843), class = "Date"), Place = c("C", "A", "B", "C", "D"), 
        Value = c(20L, 10L, 15L, 17L, 18L), Value.1 = c(8L, 5L, 6L, 
        9L, 5L)), row.names = c(NA, -5L), class = "data.frame")