I have a database that looks like this:
Year ID Date Occupy
2010 1 10 Yes
2010 2 11 No
2010 3 12 Yes
2010 4 9 No
2010 5 15 No
2011 7 7 Yes
2011 8 9 Yes
2011 9 10 Yes
2011 11 12 No
I am trying to create a code that first checks which dates are the first and last ones to be occupied (here date is the day of the month) each year. In 2010, these dates should be 10 and 12,and in 2011, 7 and 10. Then the code should filter out those rows with dates smaller or bigger than these first and last occupied dates.
The outcome should be:
Year ID Date Occupy
2010 1 10 Yes
2010 2 11 No
2010 3 12 Yes
2011 7 7 Yes
2011 8 9 Yes
2011 9 10 Yes
I tried to do these two steps in two separate parts using:
lapply(function(x) c(min(x), max(x)))
and then grouping and filtering with lubridate, but everything crashes or does not what I want.
If you prefer dplyr
syntax:
library(dplyr)
df <- tribble(
~Year, ~ID, ~Date, ~Occupy,
2010, 1, 10, "Yes",
2010, 2, 11, "No",
2010, 3, 12, "Yes",
2010, 4, 9, "No",
2010, 5, 15, "No",
2011, 7, 7, "Yes",
2011, 8, 9, "Yes",
2011, 9, 10, "Yes",
2011, 11, 12, "No"
)
df |>
mutate(
min_occupied = min(Date[Occupy == "Yes"]),
max_occupied = max(Date[Occupy == "Yes"]),
.by = Year
) |>
filter(between(Date, min_occupied, max_occupied))
#> # A tibble: 6 × 6
#> Year ID Date Occupy min_occupied max_occupied
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 2010 1 10 Yes 10 12
#> 2 2010 2 11 No 10 12
#> 3 2010 3 12 Yes 10 12
#> 4 2011 7 7 Yes 7 10
#> 5 2011 8 9 Yes 7 10
#> 6 2011 9 10 Yes 7 10
Created on 2023-11-08 with reprex v2.0.2