The quantifier a?
is supposed to match a single or no occurrence of a
. The given program uses java.util.regex
package to match a regex with a string.
My question is about the output of the program/result of the pattern matching:
Output of the program:-
Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
Question:-
It is supposed to match one or zero occurrence of a. So shouldn't it match a zero-length ""
(that is absence/zero times occurrence of a
) starting and ending at index 0
, then match a
starting at index 0
and ending at index 0
, and then a ""
starting and ending at index 1
? I think it should.
This way it seems like the matcher
keeps looking for a
's though-out the string, and then when it is sure that there are no more a
's (that is the end
of the string? ) then it looks for the zero occurrence
/ absence of a
? I think that would be tedious and that would not be the case. But then it should find a "" starting and ending at 0
before it matches a starting at index 0
and ending at index 1
?
Program:-
import java.io.InputStreamReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/*
* Enter your regex: foo
* Enter input string to search: foo
* I found the text foo starting at index 0 and ending at index 3.
* */
public class RegexTestHarness {
public static void main(String[] args){
/*Console console = System.console();
if (console == null) {
System.err.println("No console.");
System.exit(1);
}*/
while (true) {
/*Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: ", null));*/
System.out.print("\nEnter your regex: ");
Scanner scanner = new Scanner(new InputStreamReader(System.in));
Pattern pattern = Pattern.compile(scanner.next());
System.out.print("\nEnter your input string to seacrh: ");
Matcher matcher =
pattern.matcher(scanner.next());
System.out.println();
boolean found = false;
while (matcher.find()) {
/*console.format("I found the text" +
" \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(),
matcher.start(),
matcher.end());*/
System.out.println("I found the text \"" + matcher.group() + "\" starting at index " + matcher.start() + " and ending at index " + matcher.end() + ".");
found = true;
}
if(!found){
//console.format("No match found.%n", null);
System.out.println("No match found.");
}
}
}
}
?
quantifier is greedy which means it will try to find maximal possible match. Since matched part can't be reused you are not able to match empty string ""
before a
(you can think of it as par of first greedy match) but you are able to match empty string after it.
You can make it reluctant by adding ?
after this quantifier which will make it try find minimal possible match. So if you will try to find match of regex a??
you will see 0
as index of first match (which will be empty string before a
).