Search code examples
javaregexjava-8one-liner

Java Matcher class to implement toFindResult()


According to this question, there is a big difference between find and matches(), still both provide results in some form.

As a kind of Utility the toMatchResult function returns with the current results of the matches() operation. I hope my assumption under (1) is valid. (regex is here)

        String line = "aabaaabaaabaaaaaab";
        String regex = "(a*b)a{3}";
        Matcher matcher = Pattern.compile(regex).matcher(line);
        matcher.find();
//        matcher.matches();(1) --> returns false because the regex doesn't match the whole string
        String expectingAab = matcher.group(1);
        System.out.println("actually: " + expectingAab);

Unfortunately the following in no way works ( Exception: no match found ):

        String line = "aabaaabaaabaaaaaab";
        String regex = "(a*b)a{3}";
        String expectingAab = Pattern.compile(regex).matcher(line).toMatchResult().group(1);
        System.out.println("actually: " + expectingAab);

Why is that? My first assupmtion was that it doesn't work because the regex should match the whole string; but the same exceptio is being thrown with the string value aabaaa as well...

Of course the matcher needs to be set to the correct state with find(), but what if I'd like to use a oneliner for it? I actually implemented a utility calss for this:


protected static class FindResult{
    private final Matcher innerMatcher;
    public FindResult(Matcher matcher){
        innerMatcher = matcher;
        innerMatcher.find();
    }
    public Matcher toFindResult(){
        return  innerMatcher;
    }
}

public static void main(String[] args){
    String line = "aabaaabaaabaaaaaab";
    String regex = "(a*b)a{3}";
    String expectingAab = new FindResult(Pattern.compile(regex).matcher(line)).toFindResult().group(1);
    System.out.println("actually: " + expectingAab);
}

I know full well that this is not an optimal solution to create a oneliner, especially because it puts heavy loads to the garbage collector..

Is there an easier, better solution for this?

It's worth noting, that I'm looking for a solution java8. The matching logic works differently above java 9.


Solution

  • The toMatchResult() method returns the state of the previous match operation, whether it was find(), lookingAt(), or matches().

    Your line

    String expectingAab = Pattern.compile(regex).matcher(line).toMatchResult().group(1);
    

    does not invoke any of those methods, hence, will never have a previous match and always produce a IllegalStateException: No match found.

    If you want a one-liner to extract the first group of the first match, you could simply use

    String expectingAab = line.replaceFirst(".*?(a*b)a{3}.*", "$1");
    

    The pattern needs .*? before and .* after the actual match pattern, to consume the remaining string and only leave the first group as its content. The caveat is that if no match exists, it will evaluate to the original string.

    So if you want matches rather than find semantic, you can use

    String expectingNoMatch = line.replaceFirst("^(a*b)a{3}$", "$1");
    

    which will evaluate to the original string with the example input, as it doesn’t match.

    If you want your utility method not to create a FindResult instance, just use a straight-forward static method.

    However, this is a typical case of premature optimization. The Pattern.compile invocation creates a Pattern object, plus a bunch of internal node objects representing the pattern elements, the matcher invocation creates a Matcher instance plus arrays to hold the groups, and the toMatchResult invocation creates another object instance, and of course, the group(1) invocation unavoidably creates a new string instance representing the result.

    The creation of the FindResult instance is the cheapest in this row. If you care for performance, you keep the result of Pattern.compile if you use the pattern more than once, as that’s the most expensive operation and the Pattern instance is immutable and shareable, as explicitly stated in its documentation.

    Of course, the string methods replaceFirst and replaceAll do no magic, but perform the same steps under the hood.