Search code examples
rtext-miningstringr

How replace character next to numeric but not next to alphanumeric in R


I have this string

char <- "866224; Genoma viral SARS-CoV-2: Detectable; 1096628; Genoma viral SARS-CoV-2: No detectable"

and I need replace ; next to numbers with | but keep ; next to alphanumeric like this:

"866224| Genoma viral SARS-CoV-2: Detectable; 1096628| Genoma viral SARS-CoV-2: No detectable"

I was trying with str_replace_all

str_replace_all(char, "[0-9];", "|")

but remove the last number.

"86622| Genoma viral SARS-CoV-2: Detectable; 109662| Genoma viral SARS-CoV-2: No detectable"

Thanks in advance.


Solution

  • the {stringr} package allows for lookaheads and lookbehinds you could use them instead of actually capturing the last number then pasting it:

    char <- "866224; Genoma viral SARS-CoV-2: Detectable; 1096628; Genoma viral SARS-CoV-2: No detectable"
    str_replace_all(char, "(?<=[0-9]);", "|")
    #> [1] "866224| Genoma viral SARS-CoV-2: Detectable; 1096628| Genoma viral SARS-CoV-2: No detectable"
    

    the lookbehind (?<=...) basically checks if the expression that follows is preceded by .... if you want to use this in baseR then:

    gsub("(?<=[0-9]);", "|", char, perl=TRUE)
    #> [1] "866224| Genoma viral SARS-CoV-2: Detectable; 1096628| Genoma viral SARS-CoV-2: No detectable"