I've a log line as follows:
[2021-03-10 00:13:32.901] [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG] [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d] [GreeterImpl] Hello John
It's 6 blocks of text within []
and then the rest. I'm looking for a regex to extract the text within []
, and also at the end. A text block within []
can be empty.
I tried (?:\[([^\[\]]*)\])+([^\[\]]+)
but it only matches the first block in []
. I've also tried (?:(?<=\[)[^\[\]]*(?=\]))+([^\[\]]+)
but that doesn't match anything.
FWIW, the regex will be implemented in Java.
Short edit: This slightly simpler regular expression works too:
(?:(?<=\[)[^\[\]]*)|(?:(?<=\])[^\[\]]*$)
I have taken it from your own comment.
Original answer follows.
TL;DR
(?:(?<=^\[| \[)[^\[\]]*)|(?:(?<=\] )[^\[\]]*$)
Explanation: There are two parts separated by |
, “or”.
(?:(?<=^\[| \[)[^\[\]]*)
matches what is inside square brackets. [^\[\]]*
near the end matches the longest possible run of characters that are neither [
nor ]
. (?<=^\[| \[)
requires it to be preceded either by the beginning of the string and a [
or by [
. Finally I have put the whole thing into a non-capturing group to make sure that the lookbehind has precedence over the |
.(?:(?<=\] )[^\[\]]*$)
, matches what is outside square brackets at the end of the log line (Hello John
in the example). This time the run of non-brackets must be preceded by ]
and followed by the end of the line.See it in action:
On regex101 where I built it
In Java:
String logLine = "[2021-03-10 00:13:32.901]"
+ " [DefaultDispatcher-worker-2 @coroutine#3] [DEBUG]"
+ " [4231c006d9083a302fce59d5f0957226] [42c5ac3c0acfc68d]"
+ " [GreeterImpl] Hello John";
Matcher m = Pattern
.compile("(?:(?<=^\\[| \\[)[^\\[\\]]*)|(?:(?<=\\] )[^\\[\\]]*$)")
.matcher(logLine);
while (m.find()) {
System.out.println(m.group());
}
Output is:
2021-03-10 00:13:32.901 DefaultDispatcher-worker-2 @coroutine#3 DEBUG 4231c006d9083a302fce59d5f0957226 42c5ac3c0acfc68d GreeterImpl Hello John
A different idea: String.split()
String[] tokens = logLine.split("\\] \\[|\\] (?!\\[)");
assert tokens[0].startsWith("[") : logLine;
tokens[0] = tokens[0].substring(1);
for (String token : tokens) {
System.out.println(token);
}
Output is the same as before.
I am splitting at either ] [
or ]
not followed by [
(for the last split). It leaves the first [
intact, so I have to remove that separately, which is not so nice. Otherwise I find it simpler to understand than the other solutions.