Search code examples
rregextext-mining

How to replace an internal capital letter in a string


I have a range of strings as follows:

vec<-c("Peronospora boniNhenrici","Cystoseira abiesNmarina","Niplommatina rubra",
 "Padina sanctaeNcrucis","Nachygrapsus NaurusNliguricus","Melphidippa borealis")

I would like to replace the internal capital "N" in the second word for each element with "-", so that it would like:

("Peronospora boni-henrici","Cystoseira abies-marina","Niplommatina rubra",
 "Padina sanctae-crucis,"Nachygrapsus Naurus-liguricus","Melphidippa borealis")

Any suggestions? I've got the locations using the following:

stri_locate_all(vec,regex = "[N]")

but I'm not sure how to replace the "N" if it's internal. When I try to replace the capital letter "N" using gsub, it replaces all occurrences of N, rather than only the internal "N".


Solution

  • We can look for any N's surrounded by \w, which in regex matches any alphanumeric characters or underscores. If that's too broad you could replace \w with [a-zA-Z] to only match letters:

    stringr::str_replace_all(vec, "(\\w)N(\\w)", "\\1-\\2")