Search code examples
rdata.table

Create a new variable where each observation depends on another table and other variables in the data frame


I have the two following tables:

df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5))
Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30))

eth    ZIP1         ZIP2    A    B    C
A      1            1      56   49   69
B      1            2      98   10   2
B      2            3      43   69   59
A      3            4      4    30   8
C      5            5      90   10   17
                    6      19   4    84
                    7      59   95   39

I would like to create a variable Inc in the df data frame where for each observation, the value is the intersection of the eth and ZIP of the observation. In my example, it would lead to:

   eth    ZIP1   Inc        
    A      1    56
    B      1    49
    B      2    10
    A      3    43
    C      5    17

A loop or quite brute force could solve it but it takes time on my dataset, I'm looking for a more subtle way maybe using data.table.


Solution

  • What about this?

    library(reshape2)
    merge(df, melt(Inc, id="ZIP2"), by.x = c("ZIP1", "eth"), by.y = c("ZIP2", "variable"))
      ZIP1 eth value
    1    1   A    56
    2    1   B    49
    3    2   B    10
    4    3   A    43
    5    5   C    17