Search code examples
rfiltergroup-bymaxmin

Filter database based on max and min that varies each year


I have a database that looks like this:

    Year ID Date Occupy
    2010 1  10   Yes
    2010 2  11   No
    2010 3  12   Yes
    2010 4  9    No
    2010 5  15   No
    2011 7  7    Yes
    2011 8  9    Yes
    2011 9  10   Yes
    2011 11 12   No

I am trying to create a code that first checks which dates are the first and last ones to be occupied (here date is the day of the month) each year. In 2010, these dates should be 10 and 12,and in 2011, 7 and 10. Then the code should filter out those rows with dates smaller or bigger than these first and last occupied dates.

The outcome should be:

    Year ID Date Occupy
    2010 1  10   Yes
    2010 2  11   No
    2010 3  12   Yes
    2011 7  7    Yes
    2011 8  9    Yes
    2011 9  10   Yes
  

I tried to do these two steps in two separate parts using:

    lapply(function(x) c(min(x), max(x)))

and then grouping and filtering with lubridate, but everything crashes or does not what I want.


Solution

  • If you prefer dplyr syntax:

    library(dplyr)
    
    df <- tribble(
            ~Year, ~ID, ~Date, ~Occupy,
             2010,   1,    10,   "Yes",
             2010,   2,    11,    "No",
             2010,   3,    12,   "Yes",
             2010,   4,     9,    "No",
             2010,   5,    15,    "No",
             2011,   7,     7,   "Yes",
             2011,   8,     9,   "Yes",
             2011,   9,    10,   "Yes",
             2011,  11,    12,    "No"
            )
    
    df |> 
      mutate(
        min_occupied = min(Date[Occupy == "Yes"]),
        max_occupied = max(Date[Occupy == "Yes"]),
        .by = Year
      ) |> 
      filter(between(Date, min_occupied, max_occupied))
    #> # A tibble: 6 × 6
    #>    Year    ID  Date Occupy min_occupied max_occupied
    #>   <dbl> <dbl> <dbl> <chr>         <dbl>        <dbl>
    #> 1  2010     1    10 Yes              10           12
    #> 2  2010     2    11 No               10           12
    #> 3  2010     3    12 Yes              10           12
    #> 4  2011     7     7 Yes               7           10
    #> 5  2011     8     9 Yes               7           10
    #> 6  2011     9    10 Yes               7           10
    

    Created on 2023-11-08 with reprex v2.0.2