I have this data fram and I want to create an additional column that tells me the date the category was previously active.
DF <- data.frame(
Date = rep(c("10-12-2024", "10-17-2024", "10-19-2024"), c(4L, 2L, 2L)),
category = c("Red", "Red", "Blue", "Blue", "Blue", "Blue", "Red", "Blue")
)
Output:
Date | category |
---|---|
10-12-2024 | Red |
10-12-2024 | Red |
10-12-2024 | Blue |
10-12-2024 | Blue |
10-17-2024 | Blue |
10-17-2024 | Blue |
10-19-2024 | Red |
10-19-2024 | Blue |
So I would like a 3rd column of Previous active day
for that category.
I'm expecting the output to look like this:
Date | category | previous active day |
---|---|---|
10-12-2024 | Red | |
10-12-2024 | Red | |
10-12-2024 | Blue | |
10-12-2024 | Blue | |
10-17-2024 | Blue | 10-12-2024 |
10-17-2024 | Blue | 10-12-2024 |
10-19-2024 | Red | 10-12-2024 |
10-19-2024 | Blue | 10-17-2024 |
I would group by category and date then take the first entry - this identifies unique dates that each category was active. Then, lag the date variable by group and merge it back into the original data.
library(dplyr)
DF <- data.frame(
Date = rep(c("10-12-2024", "10-17-2024", "10-19-2024"), c(4L, 2L, 2L)),
category = c("Red", "Red", "Blue", "Blue", "Blue", "Blue", "Red", "Blue")
)
DF %>%
group_by(Date, category) %>%
slice_head(n=1) %>%
group_by(category) %>%
mutate(previous_active_date = lag(Date)) %>%
right_join(DF)
#> Joining with `by = join_by(Date, category)`
#> # A tibble: 8 × 3
#> # Groups: category [2]
#> Date category previous_active_date
#> <chr> <chr> <chr>
#> 1 10-12-2024 Blue <NA>
#> 2 10-12-2024 Blue <NA>
#> 3 10-12-2024 Red <NA>
#> 4 10-12-2024 Red <NA>
#> 5 10-17-2024 Blue 10-12-2024
#> 6 10-17-2024 Blue 10-12-2024
#> 7 10-19-2024 Blue 10-17-2024
#> 8 10-19-2024 Red 10-12-2024
Created on 2025-01-31 with reprex v2.1.1