Search code examples
rstringdplyrsubstring

How to arrange the values in a column to a new column based on a condition and a substring?


I have a data frame including three columns as below. I need to add a new column based on the following condition: If in var_field, the string following $ equals the string in text, put the corresponding value in var in the new column called new_col. When text equals NA, the new_col should remain NA as well. I would really appreciate your advice.

var text var_field
A happy A$excited
B sad B$angry
C angry C$sad
D excited D$happy
E NA E$nervous
F NA F$blue
G NA G$lonely

The expected new column should look like column "new_col".

var text var_field new_col
A happy A$excited D
B sad B$angry C
C angry C$sad B
D excited D$happy A
E NA E$nervous NA
F NA F$blue NA
G NA G$lonely NA

Solution

  • For 1st(!) match in base R:

    df_ <- read.table(header = T, text = "
    var text    var_field
    A   happy   A$excited
    B   sad B$angry
    C   angry   C$sad
    D   excited D$happy
    E   NA  E$nervous
    F   NA  F$blue
    G   NA  G$lonely")
    
    
    suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
    df_$new_col <- df_$var[match(df_$text, suffix)]
    df_
    #>   var    text var_field new_col
    #> 1   A   happy A$excited       D
    #> 2   B     sad   B$angry       C
    #> 3   C   angry     C$sad       B
    #> 4   D excited   D$happy       A
    #> 5   E    <NA> E$nervous    <NA>
    #> 6   F    <NA>    F$blue    <NA>
    #> 7   G    <NA>  G$lonely    <NA>
    

    Created on 2023-06-08 with reprex v2.0.2