r dplyr conditional-statements left-join

R dplyr conditional join with "join_by" not working

I have to datasets.

One includes service dates with durations for each person (id).

Second includes these persons care episodes with their start and end date.

I need to make a conditional join, meaning that the the second DF should include them sum of all service durations that happened during a particular episode (between dates "start" and "end").

I tried the following:

library(tidyverse)

services =  read.csv("https://www.dropbox.com/scl/fi/04br87qn2wmpmoyaz95ri/services.csv?rlkey=vm7ib1xqne1ceawjnsmzeolf5&dl=1")
periods =  read.csv("https://www.dropbox.com/scl/fi/fow33g4ddroinspyi4xy8/periods.csv?rlkey=v9cr0thuh6pm0rc7qrhommkpe&dl=1")

left_join(periods, services, by = join_by(id, between(service_date, start, end)))

But that gave me the following error

Solution

Error

Per the dplyr documentation, between(x, y_lower, y_upper) in join_by() checks if x falls between y_lower and y_upper. The error occurs because service_date is from the y-dataset (services). As @Friede said, you need to swap the x and y datasets.

Solution

We then join the results to the second data frame periods

To get the sum, we can group_by id and then summarise the durations. Then, we could use na.omit() %>% distinct(id, .keep_all = T) to omit all ids which don't have service durations and only keep unique ids.

library(tidyverse)
library(dplyr)

services =  read.csv("https://www.dropbox.com/scl/fi/04br87qn2wmpmoyaz95ri/services.csv?rlkey=vm7ib1xqne1ceawjnsmzeolf5&dl=1")
periods =  read.csv("https://www.dropbox.com/scl/fi/fow33g4ddroinspyi4xy8/periods.csv?rlkey=v9cr0thuh6pm0rc7qrhommkpe&dl=1")


periods_w_total_service_times <- left_join(
  periods,
  left_join(services, periods, by = join_by(id, between(service_date, start, end))) %>%
    group_by(id) %>%
    summarise(total_service_duration = sum(duration)),
  by = "id"
) %>%
  na.omit() %>% distinct(id, .keep_all = T)

Result

   X       id      start        end total_service_duration
1 33 ID100107 2016-06-14 2016-12-08               1.500000
2 34  ID10014 2011-03-09 2011-08-28               2.000000
3 46 ID100184 2016-10-08 2016-11-22               6.166667
4 52 ID100218 2013-02-11 2013-08-06               1.666667
5 57 ID100233 2011-08-15 2012-01-23               6.000000
6 69 ID100284 2013-07-12 2014-01-01               1.500000

...