I have a table with individuals that perform activities during time intervals (bounded by start and end dates). Some of them have several activities, some others don’t. Sometimes the activities overlap, and sometimes they are separated by a period of inactivity.
I would like to group by individuals and compute the number of months they have been active, no matter the activity.
Here is a small reproducible example:
df <- data.frame(id=c("x", "y", "y", "z", "z"),
activity=c("a1", "b1", "b2", "c1", "c2"),
start=c(as.Date("2017-07-01"),
as.Date("2018-04-01"), as.Date("2018-07-01"),
as.Date("2017-07-01"), as.Date("2018-02-01")),
end=c(as.Date("2018-07-31"),
as.Date("2018-05-31"), as.Date("2018-07-31"),
as.Date("2018-02-28"), as.Date("2018-07-31")))
It would look like this on a Gantt chart:
Does anyone could think of a data.table
or dplyr
solution to get the following output?
id | active_months
x | 13
y | 3
z | 13
We can create monthly sequence between start
and end
date and count unique month values for each id
.
library(dplyr)
df %>%
group_by(id) %>%
summarise(months = n_distinct(unlist(purrr::map2(start, end,
~seq(.x, .y, by = 'month')))))
# id months
# <chr> <int>
#1 x 13
#2 y 3
#3 z 13