I have problem with my RegEx to match log messages. We're using log4net and I want to group by timestamp, level, logger and message. The problem is that we're using semicolon to separate there groups and sometimes the message contains semicolon as well.
Example entries in the log file:
2023-02-24 10:06:41,903;WARN;Request.Apply.Locked;business.Validator;Waiting X to continue
(working)2023-02-24 10:06:41,903;WARN;Request.Apply.Locked;business.Validator;Framework;<METADATA>Waiting X to continue</METADATA>
(not working)Link to RegEx:
As you can see, the second link groups "/METADATA>" as the message.
The problem is that I don't know how many semicolon there are in the message. However, I know that it can be 1-3 semicolons in the logger. Is it possible to write a RegEx to match/ignore up to 3 semicolons?
As you can see in the below example, the logger will start and end with semicolon like this:
;Request.Apply.Locked;business.Validator;Framework;Test;
There we have 5 semicolons, but 3 of them should be part of the logger group.
The log can be as long as:
2023-02-24 10:06:41,903;WARN;Request.Apply.Locked;business.Validator;Framework;Test;<METADATA>Waiting X to continue</METADATA>
Here is my current RegEx:
(?<timestamp>[\d-]+ [\d:,]+);(?<level>[A-Z]+)\s?;?\s?\s?(?<logger>[\s\S]*);(?<message>[\s\S]*)
To summarize: I want a RegEx to group timestamp, level, logger, message and it should work for both example 1 and example 2.
Thanks in advance.
You could match a leading ;
, then 0-2 occurrences in between and an ending ;
using a quantifier and a negated character class:
(?<timestamp>[\d-]+ [\d:,]+);(?<level>[A-Z]+)\s*;\s*(?<logger>(?:[^;\n]*;){0,2}[^;\n]*);(?<message>.*)
Explanation
(?<timestamp>[\d-]+ [\d:,]+);
Group timestamp(?<level>[A-Z]+)
Group level matching 1+ chars A-Z\s*;\s*
Match ;
between optional whitespace chars (that you possible also match a newline)(?<logger>
Group logger
(?:[^;\n]*;){0,2}
Repeat 0-2 times any char except ;
and then match ;
[^;\n]*
Match optional chars other than ;
or a newline);
Close group logger and match ;
(?<message>.*)
Group message matching the rest of the lineSee a regex demo.
Or if the <
should not be part of the logger:
(?<timestamp>[\d-]+ [\d:,]+);(?<level>[A-Z]+)\s*;\s*(?<logger>(?:(?!&[lg]t;).)*);(?<message>.*)
See another regex demo