I have a vector with sample locations, here's a sample:
test <- c("Aa, Heeswijk T1", "Aa, Heeswijk t1",
"Aa, Middelrode t2", "Aa, Middelrode p1",
"Aa, Heeswijk t1a", "Aa, Heeswijk t3b",
"Aa, test1 T1", "Aa, test2 t1")
These strings are made out of a location name ("Aa, Heeswijk"), a route code ("T1", "p2", "t3") and sometimes a subroute ("a" or "b"). Unfortunately the route codes (t1, t2, p1, t1a) are sometimes in upper and sometimes in lower case. I want to have all the route codes in UPPER case, leaving the name and subroute unchanged. My expected outcome is:
"Aa, Heeswijk T1", "Aa, Heeswijk T1",
"Aa, Middelrode T2", "Meander Assendelft P1",
"Aa, Heeswijk T1a", "Aa, Heeswijk T3b"
"Aa, test1 T1", "Aa, test2 T1"
I have looked at toupper()
but that changes to whole string. I could also use gsub:
gsub("t1","T1", test)
gsub("t2","T2", test)
#etc.
But there must be a better R-ish way?!
Note: Route codes are always 2 chars long, have a char and an integer and are preceded by a space. So the char to change to upper is always located at the second or third from last.
We can use regex lookarounds. We match and capture a word starting with lower case letter followed by regex lookahead number ((?=[0-9])
) as a group (using parentheses) and in the replacement we use \\U
followed by the capture group to convert it to upper case.
sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
#[1] "Aa, Heeswijk T1" "Aa, Heeswijk T1" "Aa, Middelrode T2"
#[4] "Meander Assendelft P1" "Aa, Heeswijk T1a" "Aa, Heeswijk T3b"
Or without using the lookarounds, we can do this with two capture groups.
sub('\\b([a-z])([0-9])', '\\U\\1\\2', test, perl=TRUE)
Testing with the updated 'test' from the OP's post
sub('\\b([a-z])(?=[0-9])', '\\U\\1', test, perl=TRUE)
#[1] "Aa, Heeswijk T1" "Aa, Heeswijk T1" "Aa, Middelrode T2"
#[4] "Aa, Middelrode P1" "Aa, Heeswijk T1a" "Aa, Heeswijk T3b"
#[7] "Aa, test1 T1" "Aa, test2 T1"