Search code examples
rleft-joinsqldf

Avoid duplicates columns with left join sqldf


I'm trying to do a left join using sqldf package on R.

data <- sqldf(
" SELECT a.*, b.var1, b.var2, b.id 
FROM table1 as a LEFT JOIN table2 as b 
ON a.id=b.id "
)

Unfortunately I get two columns named "id" in data. How can I avoid this problem?

Thanks

Chloé


Solution

  • Note that the code is asking for two id's as a.* includes all a columns and it explicitly asked for b.id. You can simply drop the b.id

    library(sqldf)
    # test inputs
    table1 <- data.frame(id = 0, a = 3)
    table2 <- data.frame(id = 0, var1 = 1, var2 = 2)
    
    sqldf(
      " SELECT a.*, b.var1, b.var2
        FROM table1 as a LEFT JOIN table2 as b 
        ON a.id = b.id "
    )
    ##   id a var1 var2
    ## 1  0 3    1    2
    

    or if you want all b columns except id then use USING instead of ON

    sqldf(
      " SELECT *
        FROM table1 as a LEFT JOIN table2 as b 
        USING(id) "
    )
    ##   id a var1 var2
    ## 1  0 3    1    2
    

    If you do want two id's but just named differently then this will name the b.id as id2.

    sqldf(
      " SELECT a.*, b.var1, b.var2, b.id as id2
      FROM table1 as a LEFT JOIN table2 as b 
      ON a.id=b.id "
    )
    ##   id a var1 var2 id2
    ## 1  0 3    1    2   0