Consider the following datasets
data_00 <- data.table(ID = c(1,1,1,2,2,2,3,3,3,4,5,5,6),
COLOUR = c("blue","green","yellow","yellow","red","blue","green","green","white","green","blue","yellow","white"))
data_01 <- data.table(ID=c(1,2,2,2,3,3,4,4,5,6,6),
COLOUR=c("red","blue","green","white","yellow","blue","white","green","blue","white","pink"))
ID COLOUR
1: 1 blue
2: 1 green
3: 1 yellow
4: 2 yellow
5: 2 red
6: 2 blue
7: 3 green
8: 3 green
9: 3 white
10: 4 green
11: 5 blue
12: 5 yellow
13: 6 white
ID COLOUR
1: 1 red
2: 2 blue
3: 2 green
4: 2 white
5: 3 yellow
6: 3 blue
7: 4 white
8: 4 green
9: 5 blue
10: 6 white
11: 6 pink
with ID representing a person's identification and colour the wall colour of the person's room. I use the data.table package as the real data is very big and thus an efficient package is required. I want to compare the colours stated in b with the colours in a, investigating whether a specific household had this wall colour also in the year before and add a third column to data_01 with the logical values for this question.
I tried with
data_01 <- data_01[COLOUR00:=(COLOUR %in% data_00$ID[COLOUR]),by=ID]
but the logical values returned are wrong.
The expected output is supposed to be:
ID COLOUR PREV_YEAR
1: 1 red FALSE
2: 2 blue TRUE
3: 2 green FALSE
4: 2 white FALSE
5: 3 yellow FALSE
6: 3 blue FALSE
7: 4 white FALSE
8: 4 green TRUE
9: 5 blue TRUE
10: 6 white TRUE
11: 6 pink FALSE
I want to have an extra row indicating whether the colour was present in the households home the year before. Can someone help me tackle this problem?
A very simple approach would be:
data_01[, last_year := paste(ID,COLOUR) %chin% data_00[, paste(ID,COLOUR)]]
ID COLOUR last_year
1: 1 red FALSE
2: 2 blue TRUE
3: 2 green FALSE
4: 2 white FALSE
5: 3 yellow FALSE
6: 3 blue FALSE
7: 4 white FALSE
8: 4 green TRUE
9: 5 blue TRUE
10: 6 white TRUE
11: 6 pink FALSE
Using joins:
data_01[, last_year := FALSE
][data_00, on = .(ID,COLOUR), last_year := TRUE]