Search code examples
regexlogstashgrok

GROK regex capture groups not matching


I'm trying to learn how to capture different fields in a log file using logstash's grok filter for the first time and having trouble parsing it correctly. I'm using https://grokdebug.herokuapp.com/ to test my work. This is an example log file:

06/05/2021 15:08:48.591 - [aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff] - [INFO] - some more text here (0:1): {"data":{"source":"ttyUSB0","timeTotal":"20","timeLeft":"10"},"somethingid":"main","secret":"aqdsaqlaxgaaaaaa444aa32aa1aa3aaa1aaaaaaawghhjuyeqbbjjga7a64aaa","type":"TEST","message":"SOMEMESSAGE","testid":"foo.bar1.1620313718583","timestamp":1620313728590}

The grok expression I'm using (and returns no result) is

%{DATESTAMP:timestamp} - (?<test_data>(?<=\[)([a-zA-Z\.\[\]])*) - (?<rest>(?<=\[)(\[(\w*)\]))

When I remove the lookbehind expression (?<=\[) from (?<test_data>(?<=\[)([a-zA-Z\.\[\]])*) and (?<rest>(?<=\[)(\[(\w*)\])) I get the following result:

  • test_data: [aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff]
  • rest: [INFO]

The result I'm hoping to get is:

  • test_data: aa.bbbbbbbbbbbbbbb.cccccccccc.ddddddd.EEEeeeEeeeeEeeeeee.ffffFffffFff
  • rest: INFO

Would appreciate help / explanation on what I'm doing wrong


Solution

  • You are matching a sequence of patterns, so you need to consume them, otherwise the regex engine cannot reach the subsequent (rightmost) pattern parts.

    Lookarounds are not consuming patterns, they simply check the context at some location. So, (?<=\[) is a pattern that will never match since a space cannot be a [ char at the same time. This is why the pattern you have is not working.

    You can use

    %{DATESTAMP:timestamp} - \[%{DATA:test_data}\] - \[%{DATA:rest}\]
    

    Now, the regex engine will find the timestamp pattern, then it will consume space+-+space, then a [ char, then test_data, ] - [, the rest part and a ] char.