I'm trying to learn how to capture different fields in a log file using logstash's grok filter for the first time and having trouble parsing it correctly. I'm using https://grokdebug.herokuapp.com/ to test my work. This is an example log file:
06/05/2021 15:08:48.591 - [aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff] - [INFO] - some more text here (0:1): {"data":{"source":"ttyUSB0","timeTotal":"20","timeLeft":"10"},"somethingid":"main","secret":"aqdsaqlaxgaaaaaa444aa32aa1aa3aaa1aaaaaaawghhjuyeqbbjjga7a64aaa","type":"TEST","message":"SOMEMESSAGE","testid":"foo.bar1.1620313718583","timestamp":1620313728590}
The grok expression I'm using (and returns no result) is
%{DATESTAMP:timestamp} - (?<test_data>(?<=\[)([a-zA-Z\.\[\]])*) - (?<rest>(?<=\[)(\[(\w*)\]))
When I remove the lookbehind expression (?<=\[)
from (?<test_data>(?<=\[)([a-zA-Z\.\[\]])*)
and (?<rest>(?<=\[)(\[(\w*)\]))
I get the following result:
test_data: [aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff]
rest: [INFO]
The result I'm hoping to get is:
test_data: aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff
rest: INFO
Would appreciate help / explanation on what I'm doing wrong
You are matching a sequence of patterns, so you need to consume them, otherwise the regex engine cannot reach the subsequent (rightmost) pattern parts.
Lookarounds are not consuming patterns, they simply check the context at some location. So, (?<=\[)
is a pattern that will never match since a space cannot be a [
char at the same time. This is why the pattern you have is not working.
You can use
%{DATESTAMP:timestamp} - \[%{DATA:test_data}\] - \[%{DATA:rest}\]
Now, the regex engine will find the timestamp
pattern, then it will consume space+-
+space, then a [
char, then test_data
, ] - [
, the rest
part and a ]
char.