Here I share with you a simplified version of my issue. Say I have 6 observations (pid) for two variables:
pid <- c(1,2,3,4,5,6)
V1 <- c(11,11,33,11,22,33)
V2 <- c("A", "C", "M", "M", "A", "A")
data <- data.frame(pid, V1, V2)
# pid V1 V2
# 1 1 11 A
# 2 2 11 C
# 3 3 33 M
# 4 4 11 M
# 5 5 22 A
# 6 6 33 A
I would like to create a new column based on the values associated to the different combinations I have of V1 and V2, that stored in a second database:
V1 <- c(11,11,11,22,22,22,33,33,33)
V2 <- c("A", "C", "M","A", "C", "M","A", "C", "M")
valueA <- c(16,26,36,46,56,66,76,86,96)
valueB <- c(15,25,35,45,55,65,75,85,95)
values <- data.frame(V1, V2, valueA, valueB)
# V1 V2 valueA valueB
# 1 11 A 16 15
# 2 11 C 26 25
# 3 11 M 36 35
# 4 22 A 46 45
# 5 22 C 56 55
# 6 22 M 66 65
# 7 33 A 76 75
# 8 33 C 86 85
# 9 33 M 96 95
I tried this, following @akrun suggestion:
data <- mutate (data,
valueA = as.integer (ifelse(data$V1 %in% values$V1
& data$V2 %in% values$V2, values$valueA, NA))
)
But the result is the following:
# pid V1 V2 valueA
# 1 1 11 A 16
# 2 2 11 C 26
# 3 3 33 M 36
# 4 4 11 M 46
# 5 5 22 A 56
# 6 6 33 A 66
As you can see, the combination 33 M is 36 while it should be 96...
I would like to archive this:
# pid V1 V2 valueA
# 1 1 11 A 16
# 2 2 11 C 26
# 3 3 33 M 96
# 4 4 11 M 36
# 5 5 22 A 46
# 6 6 33 A 76
any suggestions on how to fix this? Any help would me much appreciated!
I solved the issue above creating a single column merging V1 and V2 as follows:
data$unique <- paste(data$V1,data$V2)
values$unique <- paste(values$V1, values$V2)
and then merged by the new column:
merge(x = data, y = values, by = "unique")
# unique pid V1.x V2.x V1.y V2.y valueA valueB
# 1 11 A 1 11 A 11 A 16 15
# 2 11 C 2 11 C 11 C 26 25
# 3 11 M 4 11 M 11 M 36 35
# 4 22 A 5 22 A 22 A 46 45
# 5 33 A 6 33 A 33 A 76 75
# 6 33 M 3 33 M 33 M 96 95