Search code examples
javaregexstringquantifiers

How does the matcher traverse the string while looking for matching characters?


The quantifier a? is supposed to match a single or no occurrence of a. The given program uses java.util.regex package to match a regex with a string.

My question is about the output of the program/result of the pattern matching:

Output of the program:-

Enter your regex: a? Enter input string to search: a I found the text "a" starting at index 0 and ending at index 1. I found the text "" starting at index 1 and ending at index 1.

Question:-

It is supposed to match one or zero occurrence of a. So shouldn't it match a zero-length "" (that is absence/zero times occurrence of a) starting and ending at index 0, then match a starting at index 0 and ending at index 0, and then a "" starting and ending at index 1? I think it should.

This way it seems like the matcher keeps looking for a's though-out the string, and then when it is sure that there are no more a's (that is the end of the string? ) then it looks for the zero occurrence/ absence of a? I think that would be tedious and that would not be the case. But then it should find a "" starting and ending at 0 before it matches a starting at index 0 and ending at index 1?

Program:-

import java.io.InputStreamReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
 *  Enter your regex: foo
 *  Enter input string to search: foo
 *  I found the text foo starting at index 0 and ending at index 3.
 * */

public class RegexTestHarness {

    public static void main(String[] args){

        /*Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }*/

        while (true) {

            /*Pattern pattern = 
            Pattern.compile(console.readLine("%nEnter your regex: ", null));*/

            System.out.print("\nEnter your regex: ");

            Scanner scanner = new Scanner(new InputStreamReader(System.in));

            Pattern pattern = Pattern.compile(scanner.next());

            System.out.print("\nEnter your input string to seacrh: ");

            Matcher matcher = 
            pattern.matcher(scanner.next());

            System.out.println();

            boolean found = false;
            while (matcher.find()) {
                /*console.format("I found the text" +
                    " \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());*/

                System.out.println("I found the text \"" + matcher.group() + "\" starting at index " + matcher.start() + " and ending at index " + matcher.end() + "."); 

                found = true;
            }
            if(!found){
                //console.format("No match found.%n", null);
                System.out.println("No match found."); 
            }
        }
    }
}

Solution

  • ? quantifier is greedy which means it will try to find maximal possible match. Since matched part can't be reused you are not able to match empty string "" before a (you can think of it as par of first greedy match) but you are able to match empty string after it.

    You can make it reluctant by adding ? after this quantifier which will make it try find minimal possible match. So if you will try to find match of regex a?? you will see 0 as index of first match (which will be empty string before a).