I have a problem with the following regular expression. I have this string that I want to match
String test = "Server returned HTTP response code: 403 for URL: https://an.url.example";
I want to "extract" the error code from this string using this regex:
Pattern pattern = Pattern.compile(".*\\s([\\d]{3})\\s.*");
Matcher matcher = pattern.matcher(test);
System.out.println(matcher.group(1));
But when I run this code I always get this exception:
java.lang.IllegalStateException: No match found
at java.base/java.util.regex.Matcher.checkMatch(Matcher.java:1852)
at java.base/java.util.regex.Matcher.group(Matcher.java:687)...
Then I thought my regex was wrong and I tested it on a website .
Picture of the regex matching the HTTP-response
Now I am even more confused and I really don't know what I did wrong (Maybe the method calls?). I just hope someone can help me and clear up my confusion. Have a great day!
The API of matcher works like this:
You create your matcher, then you need to call a method to actually run it. The idea is that you can write something like this:
while (matcher.find()) {
int number = Integer.parseInt(matcher.group(1));
System.out.println("Found a number: " + number);
}
In your case, you just care about a single find. Hence, you probably want if
, not while
, here:
Matcher matcher = pattern.matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
// What do you want to do if 'test' does not contain this string?
}
Note that your regexp is overcomplicated. All you need is Pattern.compile(" (\\d{3}) ")
- \\s
is if you want to catch other whitespace such as tabs, but that's clearly not in your input, and there's no need to put []
around \\d
. Technically just \\d{3}
will do it, but, that'll find multiple matches in a number longer than 3 digits. A cleaner take is perhaps simply "\\b\\d{3}\\b"
, taking the whole thing as a match (so, .group(0)
) - \\b
means 'word break' and doesn't capture any characters at all, it simply matches in the 'space between' - start of string, end of string, space, tab, that sort of thing marks a 'word break'.
find()
just finds the first match. There's also matches()
which is like find(), except it checks if the entire input matches. With your .*
bounds it will, but find()
and leaving the .*
out is a lot simpler.