I'm trying to learn the Java Regular Expression. I want to match several capturing group (i.e. j(a(va))
) against another string (i.e. this is java. this is ava, this is va
). I was expecting the output to be:
I found the text "java" starting at index 8 and ending at index 12.
I found the text "ava" starting at index 21 and ending at index 24.
I found the text "va" starting at index 34 and ending at index 36.
Number of group: 2
However, the IDE instead only output:
I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2
Why this is the case? Is there something I am missing?
Original code:
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("\nEnter your regex:");
Pattern pattern
= Pattern.compile(br.readLine());
System.out.println("\nEnter input string to search:");
Matcher matcher
= pattern.matcher(br.readLine());
boolean found = false;
while (matcher.find()) {
System.out.format("I found the text"
+ " \"%s\" starting at "
+ "index %d and ending at index %d.%n",
matcher.group(),
matcher.start(),
matcher.end());
found = true;
System.out.println("Number of group: " + matcher.groupCount());
}
if (!found) {
System.out.println("No match found.");
}
After running the code above, I have entered the following input:
Enter your regex:
j(a(va))
Enter input string to search:
this is java. this is ava, this is va
And the IDE outputs:
I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2
Your regexp only matches the whole string java
, it doesn't match ava
or va
. When it matches java
, it will set capture group 1 to ava
and capture group 2 to va
, but it doesn't match those strings on their own. The regexp that would produce the result you want is:
j?(a?(va))
The ?
makes the preceding item optional, so it will match the later items without these prefixes.