Search code examples
rregexstringrstringi

Add comma after first word starting with a capital letter


As the title says. I have a bunch of names and I need to add a comma after the first word that starts with a capital letter.

An example:

txt <- c( "de Van-Smith J", "van der Smith G.H.", "de Smith JW", "Smith JW")

The result should be:

[1] "de Van-Smith, J" "van der Smith, G.H." "de Smith, JW" "Smith, JW"  

I have mainly been trying to use gsub() and stringr::str_replace(), but am stuggling with the regex, any advice would be appreciated.


Solution

  • You can use -

    sub("([A-Z][\\w-]+)", "\\1,", txt, perl = TRUE)
    
    #[1] "de Van-Smith, J"   "van der Smith, G.H." "de Smith, JW"       "Smith, JW"
    

    where ([A-Z][\\w-]+) captures a word which starts with upper case letter and has - or any number of word characters following it.