Search code examples
rstringrstringi

For every time id1 is within the string in id3, put the id2 part in the new column


I would like to generate one additional column to this data frame with some additional information: every time id1 is within the string in id3, replace this portion with its counterpart in id2:

 library(tidyverse)



    df1 <- tibble(
         id1 = c("119930", "124659", "114679", "119934", "126821", "124679", "119842", "134863", "133678", "133675"),
         id2 = c("AB1", "AB2", "AB3", "AB5", "AB4", "AB0", "DF1", "FR3", "GTA5", "BA1"),
         id3 = c("", "119934;126821;124679", "", "", "", "119842",  "", "", "", "133678;119930"))

This is what I have:

    > df1
# A tibble: 10 x 3
   id1    id2   id3                   
   <chr>  <chr> <chr>                 
 1 119930 AB1   ""                    
 2 124659 AB2   "119934;126821;124679"
 3 114679 AB3   ""                    
 4 119934 AB5   ""                    
 5 126821 AB4   ""                    
 6 124679 AB0   "119842"       
 7 119842 DF1   ""                    
 8 134863 FR3   ""                    
 9 133678 GTA5  ""                    
10 133675 BA1   "133678;119930" 

Result I need:

    > df1
# A tibble: 10 x 4
   id1    id2   id3                    id4  
   <chr>  <chr> <chr>                  <chr>
 1 119930 AB1   ""                     ""   
 2 124659 AB2   "119934;126821;124679" "AB5;AB4;AB0"   
 3 114679 AB3   ""                     ""   
 4 119934 AB5   ""                     ""   
 5 126821 AB4   ""                     ""   
 6 124679 AB0   ""                     ""   
 7 119842 DF1   ""                     ""   
 8 134863 FR3   "119842"              "DF1"   
 9 133678 GTA5  ""                     ""   
10 133675 BA1   "133678;119930"               "GTA5;AB1"  

Solution

  • Here is a base R solution using strsplit() + match()

    df1$id4 <- apply(df1, 1, function(v) {
      paste0(na.omit(df1$id2[match(unlist(strsplit(v["id3"],split = ";")),df1$id1)]),collapse = ";")
    })
    

    such that

    > df1
    # A tibble: 10 x 4
       id1    id2   id3                  id4         
       <chr>  <chr> <chr>                <chr>       
     1 119930 AB1   ""                   ""          
     2 124659 AB2   "119934;126821;1246~ "AB5;AB4;AB~
     3 114679 AB3   ""                   ""          
     4 119934 AB5   ""                   ""          
     5 126821 AB4   ""                   ""          
     6 124679 AB0   "119842"             "DF1"       
     7 119842 DF1   ""                   ""          
     8 134863 FR3   ""                   ""          
     9 133678 GTA5  ""                   ""          
    10 133675 BA1   "133678;119930"      "GTA5;AB1"