I have a data set that contains movements of groups between different locations, which looks like this:
library(data.table)
df = data.table(Date = c('01/01/2021', '06/01/2021', '09/01/2021', '10/01/2021', '20/01/2021', '24/01/2021'),
Group = c('Group A', 'Group A', 'Group B', 'Group B', 'Group B', 'Group A'),
From = c('NA', 1, 3, 4, 5, 4),
To = c(1, 'NA', 4, 5, 'NA', 1))
Date Group From To
01/01/2021 Group A NA 1
06/01/2021 Group A 1 NA
09/01/2021 Group B 3 4
10/01/2021 Group B 4 5
20/01/2021 Group B 5 NA
24/01/2021 Group A 4 1
Now, I would like to create a data table for each group/location combination the time range that was spent at the specific location. In the case the "From" column contains NA, this group just entered the movement process. When the "To" column contains NA, this group exited the movement process. The desired data would then look like:
Group Date_entry Date_exit Location
Group A 01/01/2021 06/01/2021 1
Group B 09/01/2021 10/01/2021 4
Group B 10/01/2021 20/01/2021 5
Group A 24/01/2021 NA 1
I recommend an approach like the following:
df %>%
inner_join(
df,
by = c("Group" = "Group", "To" = "From"),
suffix = c("_F", "_T")
) %>%
select(Group, Date_F, Date_T, Location = To) %>%
filter(Location != "NA") %>%
mutate(Date_T = ifelse(Date_F < Date_T, Date_T, NA)) %>%
group_by(Group, Location, Date_F) %>%
summarise(Date_T = min(Date_T), .groups = "drop")
This may be a more general approach than you need - it is designed to handle many different rows per Group-Location pair.
How it works:
df
to itself, so we have records for all date pairs where a group enters and leaves a location. Note that these dates may be out of order, and dates may be repeated.