Search code examples
rif-statementlogicmultiple-conditions

Assign new value to database based on value stored in another database


Here I share with you a simplified version of my issue. Say I have 6 observations (pid) for two variables:

    pid <- c(1,2,3,4,5,6)
    V1 <- c(11,11,33,11,22,33)
    V2 <- c("A", "C", "M", "M", "A", "A")
    data <- data.frame(pid, V1, V2)
# pid V1 V2
# 1   1 11  A
# 2   2 11  C
# 3   3 33  M
# 4   4 11  M
# 5   5 22  A
# 6   6 33  A

I would like to create a new column based on the values associated to the different combinations I have of V1 and V2, that stored in a second database:

V1 <- c(11,11,11,22,22,22,33,33,33)
V2 <- c("A", "C", "M","A", "C", "M","A", "C", "M")
valueA <- c(16,26,36,46,56,66,76,86,96)
valueB <- c(15,25,35,45,55,65,75,85,95)
values <- data.frame(V1, V2, valueA, valueB)
# V1 V2 valueA valueB
# 1 11  A     16     15
# 2 11  C     26     25
# 3 11  M     36     35
# 4 22  A     46     45
# 5 22  C     56     55
# 6 22  M     66     65
# 7 33  A     76     75
# 8 33  C     86     85
# 9 33  M     96     95

I tried this, following @akrun suggestion:

data <- mutate (data, 
                valueA = as.integer (ifelse(data$V1 %in% values$V1
                                            & data$V2 %in% values$V2, values$valueA, NA))
                )

But the result is the following:

# pid V1 V2 valueA
# 1   1 11  A     16
# 2   2 11  C     26
# 3   3 33  M     36
# 4   4 11  M     46
# 5   5 22  A     56
# 6   6 33  A     66

As you can see, the combination 33 M is 36 while it should be 96...

I would like to archive this:

#   pid V1 V2 valueA
# 1   1 11  A     16
# 2   2 11  C     26
# 3   3 33  M     96
# 4   4 11  M     36
# 5   5 22  A     46
# 6   6 33  A     76

any suggestions on how to fix this? Any help would me much appreciated!


Solution

  • I solved the issue above creating a single column merging V1 and V2 as follows:

    data$unique  <- paste(data$V1,data$V2)
    values$unique <- paste(values$V1, values$V2)
    

    and then merged by the new column:

    merge(x = data, y = values, by = "unique")
    # unique pid V1.x V2.x V1.y V2.y valueA valueB
    # 1   11 A   1   11    A   11    A     16     15
    # 2   11 C   2   11    C   11    C     26     25
    # 3   11 M   4   11    M   11    M     36     35
    # 4   22 A   5   22    A   22    A     46     45
    # 5   33 A   6   33    A   33    A     76     75
    # 6   33 M   3   33    M   33    M     96     95