I have this kind of input
word w'ord wo'rd
I need to convert to uppercase both characters at the starts of the word and right after the '
character (which can exists multiple times).
The output I need (using the previous example) is
word W'Ord Wo'Rd
I tried with a simple pattern
s.replaceAll("(\\w)(\\w*)'(\\w)", "$1");
but I'm unable to convert the group 1 and 3 to uppercase
EDIT: After I discovered a little mistake in the main question, I edited @Wiktor Stribizew code in order to include the case I missed.
Matcher m = Pattern.compile("(\\w)(\\w*)'(\\w)").matcher(s);
StringBuffer result = new StringBuffer();
while (m.find()) {
m.appendReplacement(result, m.group(1).toUpperCase() + m.group(2) + "'" + m.group(3).toUpperCase());
}
m.appendTail(result);
s = result.toString();
You need to use Matcher#appendReplacement
in Java to be able to process the match. Here is an example:
String s = "word w'ord wo'rd";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\b(\\w)(\\w*)'(\\w(?:'\\w)*)").matcher(s);
while (m.find()) {
m.appendReplacement(result,
m.group(1).toUpperCase()+m.group(2) + "'" + m.group(3).toUpperCase());
}
m.appendTail(result);
System.out.println(result.toString());
// => word W'Ord Wo'Rd
See the Java demo
Java 9+ equivalent (demo):
String s = "wo'rd w'ord wo'r'd";
Matcher m = Pattern.compile("\\b(\\w)(\\w*)'(\\w(?:'\\w)*)").matcher(s);
System.out.println(
m.replaceAll(r -> r.group(1).toUpperCase()+r.group(2) + "'" + r.group(3).toUpperCase())
);
//wo'rd w'ord wo'r'd => Wo'Rd W'Ord Wo'R'D
//word w'ord wo'rd => word W'Ord Wo'Rd
Pattern break-down:
\b
- a leading word boundary(\w)
- Group 1: a single word char(\w*)
- Group 2: zero or more word chars'
- a single quote(\w(?:'\w)*)
- Group 3:
\w
- a word char(?:'\w)*
- zero or more sequences of:
'
- a single quote\w
- a word char.Now, if you want to make the pattern more precise, you can change the \w
that are supposed to match lowercase letters with \p{Ll}
and the \w
that is supposed to match any letter with \p{L}
. The pattern would look like "(?U)\\b(\\p{Ll})(\\p{L}*)'(\\p{Ll}(?:'\\p{Ll})*)"
- however, you risk to leave letters in lowercase (those after '
) if there are uppercase before lowercase ones (like in wo'r'D's
-> Wo'R'D's
). (?U)
is a Pattern.UNICODE_CHARACTER_CLASS
inline modifier that makes \b
word boundary Unicode-aware.