Search code examples
javaregexregex-group

Regex doesn't match, but only in Java


I have a problem with the following regular expression. I have this string that I want to match

String test = "Server returned HTTP response code: 403 for URL: https://an.url.example";

I want to "extract" the error code from this string using this regex:

Pattern pattern = Pattern.compile(".*\\s([\\d]{3})\\s.*");
        Matcher matcher = pattern.matcher(test);
        System.out.println(matcher.group(1));

But when I run this code I always get this exception:

java.lang.IllegalStateException: No match found
at java.base/java.util.regex.Matcher.checkMatch(Matcher.java:1852)
at java.base/java.util.regex.Matcher.group(Matcher.java:687)...

Then I thought my regex was wrong and I tested it on a website .

Picture of the regex matching the HTTP-response

Now I am even more confused and I really don't know what I did wrong (Maybe the method calls?). I just hope someone can help me and clear up my confusion. Have a great day!


Solution

  • The API of matcher works like this:

    You create your matcher, then you need to call a method to actually run it. The idea is that you can write something like this:

    while (matcher.find()) {
      int number = Integer.parseInt(matcher.group(1));
      System.out.println("Found a number: " + number);
    }
    

    In your case, you just care about a single find. Hence, you probably want if, not while, here:

    Matcher matcher = pattern.matcher(test);
    if (matcher.find()) {
      System.out.println(matcher.group(1));
    } else {
      // What do you want to do if 'test' does not contain this string?
    }
    

    Note that your regexp is overcomplicated. All you need is Pattern.compile(" (\\d{3}) ") - \\s is if you want to catch other whitespace such as tabs, but that's clearly not in your input, and there's no need to put [] around \\d. Technically just \\d{3} will do it, but, that'll find multiple matches in a number longer than 3 digits. A cleaner take is perhaps simply "\\b\\d{3}\\b", taking the whole thing as a match (so, .group(0)) - \\b means 'word break' and doesn't capture any characters at all, it simply matches in the 'space between' - start of string, end of string, space, tab, that sort of thing marks a 'word break'.

    find() just finds the first match. There's also matches() which is like find(), except it checks if the entire input matches. With your .* bounds it will, but find() and leaving the .* out is a lot simpler.