Search code examples
javaregexmatchcapturing-group

How to match several capturing groups, but results not as expected


I'm trying to learn the Java Regular Expression. I want to match several capturing group (i.e. j(a(va))) against another string (i.e. this is java. this is ava, this is va). I was expecting the output to be:

I found the text "java" starting at index 8 and ending at index 12.
I found the text "ava" starting at index 21 and ending at index 24.    
I found the text "va" starting at index 34 and ending at index 36.
Number of group: 2

However, the IDE instead only output:

I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2

Why this is the case? Is there something I am missing?

Original code:

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("\nEnter your regex:");

        Pattern pattern
                = Pattern.compile(br.readLine());

        System.out.println("\nEnter input string to search:");
        Matcher matcher
                = pattern.matcher(br.readLine());

        boolean found = false;
        while (matcher.find()) {
            System.out.format("I found the text"
                    + " \"%s\" starting at "
                    + "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());
            found = true;
            System.out.println("Number of group: " + matcher.groupCount());
        }
        if (!found) {
            System.out.println("No match found.");
        }

After running the code above, I have entered the following input:

Enter your regex:
j(a(va))

Enter input string to search:
this is java. this is ava, this is va

And the IDE outputs:

I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2

Solution

  • Your regexp only matches the whole string java, it doesn't match ava or va. When it matches java, it will set capture group 1 to ava and capture group 2 to va, but it doesn't match those strings on their own. The regexp that would produce the result you want is:

    j?(a?(va))
    

    The ? makes the preceding item optional, so it will match the later items without these prefixes.

    DEMO