I have to datasets.
One includes service dates with durations for each person (id).
Second includes these persons care episodes with their start and end date.
I need to make a conditional join, meaning that the the second DF should include them sum of all service durations that happened during a particular episode (between dates "start" and "end").
I tried the following:
library(tidyverse)
services = read.csv("https://www.dropbox.com/scl/fi/04br87qn2wmpmoyaz95ri/services.csv?rlkey=vm7ib1xqne1ceawjnsmzeolf5&dl=1")
periods = read.csv("https://www.dropbox.com/scl/fi/fow33g4ddroinspyi4xy8/periods.csv?rlkey=v9cr0thuh6pm0rc7qrhommkpe&dl=1")
left_join(periods, services, by = join_by(id, between(service_date, start, end)))
But that gave me the following error
Per the dplyr documentation, between(x, y_lower, y_upper)
in join_by()
checks if x falls between y_lower and y_upper. The error occurs because service_date
is from the y-dataset (services).
As @Friede said, you need to swap the x and y datasets.
We then join the results to the second data frame periods
To get the sum, we can group_by
id and then summarise the durations. Then, we could use na.omit() %>% distinct(id, .keep_all = T)
to omit all ids which don't have service durations and only keep unique ids.
library(tidyverse)
library(dplyr)
services = read.csv("https://www.dropbox.com/scl/fi/04br87qn2wmpmoyaz95ri/services.csv?rlkey=vm7ib1xqne1ceawjnsmzeolf5&dl=1")
periods = read.csv("https://www.dropbox.com/scl/fi/fow33g4ddroinspyi4xy8/periods.csv?rlkey=v9cr0thuh6pm0rc7qrhommkpe&dl=1")
periods_w_total_service_times <- left_join(
periods,
left_join(services, periods, by = join_by(id, between(service_date, start, end))) %>%
group_by(id) %>%
summarise(total_service_duration = sum(duration)),
by = "id"
) %>%
na.omit() %>% distinct(id, .keep_all = T)
X id start end total_service_duration
1 33 ID100107 2016-06-14 2016-12-08 1.500000
2 34 ID10014 2011-03-09 2011-08-28 2.000000
3 46 ID100184 2016-10-08 2016-11-22 6.166667
4 52 ID100218 2013-02-11 2013-08-06 1.666667
5 57 ID100233 2011-08-15 2012-01-23 6.000000
6 69 ID100284 2013-07-12 2014-01-01 1.500000
...
.
.