Search code examples
rregexstringstringr

Adding letter to a string vector in R


Disregarding the hashtaged lines in my input, is there a way to add:

  • A* before "alphanumeric" elements that don't have a * prefix and appear after ~ and
  • R* before "alphanumeric" elements that don't have a * prefix and appear after ~~?

Desired output is shown below.

input <- "y ~ x1 + x2
           ## Variances of x1 and x2 are 1
           x1 ~~ 1*x1
           x2 ~~ 1*x2
           ## x1 and x2 are correlated
           x1 ~~ x2"

output <- "y ~ A*x1 + A*x2
           ## Variances of x1 and x2 are 1
           x1 ~~ 1*x1
           x2 ~~ 1*x2
           ## x1 and x2 are correlated
           x1 ~~ R*x2"

Solution

  • You can use stringr::str_replace_all() and make use of the match groups. In the example below I am using three stages to achieve the result you specify.

    library(stringr)
    
    res <-
      str_replace_all(input,
                      " ~ ([a-z])",
                      " ~ A*\\1") |>
      str_replace_all("\\+ ([a-z])",
                      "+ A*\\1") |>
      str_replace_all("~~ ([a-z])",
                      "~~ R*\\1")
    
    cat(res)
    #> y ~ A*x1 + A*x2
    #>          ## Variances of x1 and x2 are 1
    #>          x1 ~~ 1*x1
    #>          x2 ~~ 1*x2
    #>          ## x1 and x2 are correlated
    #>          x1 ~~ R*x2