Search code examples
regexrcapitalization

R: Capitalizing everything after a certain character


I would like to capitalize everything in a character vector that comes after the first _. For example the following vector:

x <- c("NYC_23df", "BOS_3_rb", "mgh_3_3_f") 

Should come out like this:

"NYC_23DF" "BOS_3_RB" "mgh_3_3_F"

I have been trying to play with regular expressions, but am not able to do this. Any suggestions would be appreciated.


Solution

  • You were very close:

    gsub("(_.*)","\\U\\1",x,perl=TRUE)
    

    seems to work. You just needed to use _.* (underscore followed by zero or more other characters) rather than _* (zero or more underscores) ...

    To take this apart a bit more:

    • _.* gives a regular expression pattern that matches an underscore _ followed by any number (including 0) of additional characters; . denotes "any character" and * denotes "zero or more repeats of the previous element"
    • surrounding this regular expression with parentheses () denotes that it is a pattern we want to store
    • \\1 in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched _.*
    • \\U, in conjunction with perl=TRUE, says "put what follows in upper case" (uppercasing _ has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern: gsub("g(.*)","g\\U\\1",x,perl=TRUE))

    For more details, search for "replacement" and "capitalizing" in ?gsub (and ?regexp for general information about regular expressions)