Search code examples
sqlrpostgresqlleft-join

Join with two conditions in R


I want to perform a join in RStudio with 2 conditions. My goal is to replicate a SQL query like the following, but in R language:

SELECT
   ticket_id,
   sum(time_to_close)
FROM helpdesk_ticket ht
LEFT JOIN dim_date dd ON ht.create_date <= dd.formatted_date AND dd.formatted_date <= ht.closed_date
GROUP BY 1

Assuming that helpdesk_ticket and dim_date are two R dataframes, this is what I have tried, but it returns an error in the left_join:

result <- helpdesk_ticket %>%
  left_join(dim_date, by = c("create_date" <= "formatted_date", "formatted_date" <= "closed_date")) %>%
  group_by(ticket_id) %>%
  summarise(time_to_close = sum(time_to_close))

Solution

  • Because the SQL query uses an interval join with <= and >= operators rather than equality =, consider dplyr::join_by to precompute the inequality join to be used in left_join.

    by <- join_by(
      create_date <= formatted_date, 
      closed_date >= formatted_date,
    ) 
    
    result <- helpdesk_ticket %>%
      left_join(dim_date, by) %>%
      group_by(ticket_id) %>%
      summarise(time_to_close = sum(time_to_close))
    

    Alternatively, if both or either data sets are not too large for cartesian product, consider a cross_join followed by a filter:

    result <- helpdesk_ticket %>%
      cross_join(dim_date) %>%
      filter(
        create_date <= formatted_date, 
        closed_date >= formatted_date
      )