Search code examples
rleft-joinmultiple-conditions

R - join tables in R using logical conditions and using >, < and = operator


Kinda new in R, but trying to figure it out.

So, I have a query from MySQL that joins two tables based on their tunetime (Login) and combined with the TV scheadule I can check what they are watching.

In SQL I can do it like this:

SELECT 
    s.Device,
    s.CallLetter AS Channel,
    e.Title AS Program_Title,
    e.TitleId AS Program_Title_ID,
    s.TuneDuration AS Stream_Time,
    s.TuneTime AS Login_Time,
    e.StartDate AS StartTime,
    e.EndDate AS End_Time
FROM
    sessoeslive_canais s
        JOIN
    epg e ON s.TuneTime >= e.StartDate
        AND s.TuneTime <= e.EndDate
        AND s.CallLetter = e.CallLetter

But now, in R, that problem seems to be bigger.

I've written something like:

join <- 
 sessoeslive_canais %>%
 left_join(epg, by = c("CallLetter" == "CallLetter", "TuneTime" > "StarDate", "TuneTime" < "EndDate" ))

Do you have any tip on how to do in R what I've expressed in SQL?


Solution

  • Here we can use fuzzyjoin

    library(fuzzyjoin)
    fuzzy_left_join(sessoeslive_canais, epg, by = c("CallLetter" == "CallLetter",
          "TuneTime" = "StarDate", "TuneTime" = "EndDate" ), 
               match_fun = list(`==`, `>`, `<`)) %>%
      select(Program_Title = Title.y, Program_Title_ID = TitleId.y,
         Stream_Time = TuneDuration.x, Login_Time = TuneTime.x,
         StartTime = StartDate.y, End_Time = EndDate.y)
    

    Or with non-equi join from data.table

    library(data.table)
    setDT(sessoeslive_canais)[epg, on = .(CallLetter, 
             TuneTime > StartDate, TuneTime < EndDate)]