Search code examples
rdata.tablekey

R. Data.table. How to diff columns with the same names for two different data.table?


I have two data.tables.

dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")

I want to add a new column "Description" for the first data.table from the second one (where "Symbol" is key ) by reference. I try use the next way:

dt1[, Description := dt2[.(Symbol1), Description]]

And this works fine. I've got the result

dt1
Symbol1 Volume        Description
1:  EURUSD      1     Euro vs Dollar
2:  USDCAD     10 Canadian vs Dollar
3:  EURUSD      2     Euro vs Dollar
4:  CADJPY     20               <NA>

But if these two data.tables have the same column name (Symbol)

dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[, Description := dt2[.(Symbol), Description]]

I've got the error:

Error in `[.data.table`(dt1, , `:=`(Description, dt2[.(Symbol), Description])) : 
  Supplied 2 items to be assigned to 4 items of column 'Description'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Is there any way to make it work correctly? Thanks for any help!


Solution

  • In your second trial, it fails because the Symbol in dt2[.(Symbol), Description] is from dt2, not from dt1. In you first trial, Symbol1 is not found in dt2 so R will try to search dt1 for Symbol1. (If this fails too, R will try to search the global environment for Symbol1. Therefore, orders matter here.)

    I would do it this way:

    dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
    dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
    
    dt1[dt2, Description := i.Description, on = .(Symbol1 = Symbol)]
    
    dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
    dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
    
    dt1[dt2, Description := i.Description, on = .(Symbol)]
    

    Another workaround is to use get and tell it explicitly where to search for Symbol. For example:

    dt1[Symbol != "EURUSD", Description := dt2[.(get('Symbol', env = sys.parent(3))), Description]]
    dt1
    #    Symbol Volume        Description
    # 1: EURUSD      1               <NA>
    # 2: USDCAD     10 Canadian vs Dollar
    # 3: EURUSD      2               <NA>
    # 4: CADJPY     20               <NA>