Search code examples
rinner-join

How to do conditional joins based on column values


enter image description here

I am trying to join to a lookup table based on the field value of my base table "Score" being between two field values in the lookup table ("ATTRIBUTE_2" and "ATTRIBUTE_3").

Basically the R version of the following SQL:

SELECT base.PolicyNo
    ,base.Score
    ,lkp.value ScoreFactor
FROM base base
INNER JOIN lkp lkp
    ON base.PERIL = lkp.PERIL
    AND base.COVERAGE = lkp.COVERAGE
    AND base.LocStCd = lkp.ATTRIBUTE_1
    AND base.Score > lkp.ATTRIBUTE_2
    AND base.Score <= lkp.ATTRIBUTE_3

I'm very new to R and still trying to grasp a lot of the syntax, so please be detailed in your responses :)

I've tried

TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd",ATTRIBUTE_2 < "Score",ATTRIBUTE_3 >= "TOTALTIERSCOREVA"), nomatch=NA]

and R tells me that ATTRIBUTE_2 and ATTRIBUTE_3 can't be found

I also tried

u = TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd"), nomatch=NA]
    u = u[u$ATTRIBUTE_2 < u$Score,]
    u = u[u$ATTRIBUTE_3 >= u$Score,]

Which results in massive duplication of rows and the need for a cartesian join...doesn't seem efficient.

Per Phil's comment I tried

left_join(DF1,TIERLKP, by = c("PERIL","COVERAGE","LocStCd" = "ATTRIBUTE_1","Score" > "ATTRIBUTE_2", "Score" <= "ATTRIBUTE_3"),copy = TRUE)

with/without quotes as well as changing the order but am still getting "Error: Join columns must be present in data."


Solution

  • We may use the data.table syntax as below. Also, convert to data.table if the data is data.frame with setDT

    library(data.table)
    setDT(TIERLKP)
    setDT(DF1)
    TIERLKP[DF1, on = .(PERIL, COVERAGE,ATTRIBUTE_1 = LocStCd,
       ATTRIBUTE_2 < Score,
       ATTRIBUTE_3 >= TOTALTIERSCOREVA), nomatch=NA] 
    

    In the devel version of dplyr, we can use join_by which does the non-equi join

    library(dplyr)
    inner_join(TIERLKP, DF1, by = join_by(PERIL, COVERAGE,
          ATTRIBUTE_1 = LocStCd,   
           ATTRIBUTE_2 < Score,
       ATTRIBUTE_3 >= TOTALTIERSCOREVA))