I would like to capitalize everything in a character vector that comes after the first _
. For example the following vector:
x <- c("NYC_23df", "BOS_3_rb", "mgh_3_3_f")
Should come out like this:
"NYC_23DF" "BOS_3_RB" "mgh_3_3_F"
I have been trying to play with regular expressions, but am not able to do this. Any suggestions would be appreciated.
You were very close:
gsub("(_.*)","\\U\\1",x,perl=TRUE)
seems to work. You just needed to use _.*
(underscore followed by zero or more other characters) rather than _*
(zero or more underscores) ...
To take this apart a bit more:
_.*
gives a regular expression pattern that matches an underscore _
followed by any number (including 0) of additional characters; .
denotes "any character" and *
denotes "zero or more repeats of the previous element"()
denotes that it is a pattern we want to store\\1
in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched _.*
\\U
, in conjunction with perl=TRUE
, says "put what follows in upper case" (uppercasing _
has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern: gsub("g(.*)","g\\U\\1",x,perl=TRUE)
)For more details, search for "replacement" and "capitalizing" in ?gsub
(and ?regexp
for general information about regular expressions)