Search code examples
rdataframedplyrlapply

Remap database using key


I have a dataset to update using a key-dataset. I would like to change entries in a dataset (group_1:group_3) to their corresponding value.

Mind that in reality my key dataset has +25k entries and seeking for an efficient solution is what takes me here! All help appreciated.

Toy example

df <- data.frame(state=rep("state_a"),
             candidate=c("a","b","c"),
             group_1= c("g_1","g_2","g_3"),
             group_2= c("g_4","g_5",NA),
             group_3= c("g_5",NA,NA))

key <- data.frame(group=c("g_1","g_2","g_3","g_4","g_5"),
              leader=c("l_1","l_2","l_3","l_4","l_4"))

Result:

df <- data.frame(state=rep("state_a"),
             candidate=c("a","b","c"),
             group_1= c("g_1","g_2","g_3"),
             group_2= c("g_4","g_5",NA),
             group_3= c("g_5",NA,NA))

ADDITIONAL REQUEST: I would like to use df_2 (same dimension as df) to decide which entries in to keep df_final, then transform.

df_2 <- data.frame(state=rep("state_a"),
                   candidate=c("a","b","c"),
                   value_1= c("1","2","0"),
                   value_2= c("3","2",NA),
                   value_3= c("5",NA,NA))

df_final_temp <- data.frame(state=rep("state_a"),
             candidate=c("a","b","c"),
             group_1= c("g_1","g_2",NA),
             group_2= c("g_4","g_5",NA),
             group_3= c("g_5",NA,NA))

df_final <- data.frame(state=rep("state_a"),
             candidate=c("a","b","c"),
             group_1= c("l_1","l_2",NA),
             group_2= c("l_4","l_5",NA),
             group_3= c("l_5",NA,NA))

Solution

  • An option is to use key/value pair as a named vector to match the columns

    df[-(1:2)] <- setNames(as.character(key$leader), key$group)[as.matrix(df[-(1:2)])]    
    
    
    df
    #    state candidate group_1 group_2 group_3
    #1 state_a         a     l_1     l_4     l_4
    #2 state_a         b     l_2     l_4    <NA>
    #3 state_a         c     l_3    <NA>    <NA>