As part of a larger data-set, I have data along the lines of the following:
data <- structure(
list(
ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4),
Event = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
Date = c("12-24-2019", "12-25-2019", "12-26-2019", "12-27-2019", "12-28-2019", "12-29-2019", "12-30-2019", "12-31-2019", "12-01-2019", "12-02-2019", "12-03-2019", "12-04-2019"
)
), class = "data.frame", row.names = c(NA, -12L))
Resulting in the following:
Essentially, I am wanting to end up with one row per ID, but which row is kept depends on whether the event occurred for an ID or not.
If the event occurs for an ID, I want to keep the date that the event occurred and get rid of the other rows for that ID
If the event does not occur at any point for an ID I want to keep the latest date and get rid of the other rows for that ID
Desired output:
I'm quite new to R and am not sure how to start with this - my googling hasn't helped!
Many thanks for any help
Here is adplyr
solution using summarise
:
library(dplyr)
data %>%
group_by(ID) %>%
summarise(
Date = c(Date[Event == 1], last(Date))[1],
Event = max(Event),
)
# A tibble: 4 × 3
ID Date Event
<dbl> <chr> <dbl>
1 1 12-24-2019 1
2 2 12-28-2019 0
3 3 12-31-2019 0
4 4 12-02-2019 1
This works for the provided data and for all datasets, where there is at most one event per ID
. Otherwise the code would have to be slightly modified.
If it is possible to have more Events per ID
then we could use tail
to get the Date of the last Event.
data %>%
group_by(ID) %>%
summarise(
Date = c(tail(Date[Event == 1], 1), last(Date))[1],
Event = max(Event),
)
The idea is very simple: inside summarise
I create a c(tail(Date[Event == 1], 1), last(Date))
The first element tail(Date[Event == 1], 1)
is equal to the last entry (tail(..., 1)
) of Date
for which Event == 1
. If no such entry exists this expression evaluates to a vector of length zero, making the overall vector length 1.
The second element is simply the last Date.
Finally, I use [1]
at the end to extract the first element of the vector described above.