Search code examples
regexlogstashlogstash-grok

Reuse of a capture in grok


I am trying to reuse a capture in grok plugin from Logstash. I have this logfile

30 Jul 2019 09:56:28 <ID1> DEVICE0 START_THREAD THREAD_ID(B01234)
30 Jul 2019 09:56:28 <ID1> DEVICE1 START_THREAD THREAD_ID(B12345)
30 Jul 2019 09:56:28 <ID1> DEVICE2 START_THREAD THREAD_ID(A12345)
30 Jul 2019 09:56:28 <ID2> DEVICE1 PROCESSING SPOOLID 100 
30 Jul 2019 09:56:28 <ID2> DEVICE2 PROCESSING SPOOLID 101
30 Jul 2019 09:56:28 <ID2> DEVICE2 PROCESSING SPOOLID 101
30 Jul 2019 09:56:28 <ID2> DEVICE1 PROCESSING SPOOLID 100
30 Jul 2019 09:56:28 <ID4> DEVICE1 SPOOLID 100 PROCESSED
30 Jul 2019 09:56:28 <ID4> DEVICE2 SPOOLID 101 PROCESSED
30 Jul 2019 09:56:28 <ID3> DEVICE2 STOP_THREAD THREAD_ID(B12345) 
30 Jul 2019 09:56:28 <ID3> DEVICE2 STOP_THREAD THREAD_ID(A12345)
30 Jul 2019 09:56:28 <ID1> DEVICE2 START_THREAD THREAD_ID(A23456)
30 Jul 2019 09:56:29 <ID2> DEVICE2 PROCESSING SPOOLID 102
30 Jul 2019 09:56:29 <ID2> DEVICE2 PROCESSING SPOOLID 102
30 Jul 2019 09:56:29 <ID4> DEVICE2 SPOOLID 102 PROCESSED
30 Jul 2019 09:56:29 <ID3> DEVICE2 STOP_THREAD THREAD_ID(A23456) 
30 Jul 2019 09:56:29 <ID2> DEVICE0 PROCESSING SPOOLID 99 
30 Jul 2019 09:56:29 <ID4> DEVICE0 SPOOLID 99 PROCESSED
30 Jul 2019 09:56:29 <ID3> DEVICE0 STOP_THREAD THREAD_ID(B12345)

What I would like to do is to capture the DEVICE and the SPOOL_ID into one logstash event. So far I've managed to build this Regular Expression which gives me the correct SPOOLID to the corresponding DEVICE

/.*?\>\s+(\b.*?\b)\s*START_THREAD.*?\1\s+SPOOLID\s+(\d+)\s+PROCESSED/ms

I've been trying to translate this RegEx to grok with this code

(?m)%{DATA}\>\s+\b%{DATA:device}\b\s*START_THREAD%{DATA}(?<device>\s+SPOOLID\s+%{NUMBER:num}\s+PROCESSED)

Unfortunately, I am getting the wrong SPOOLID for the corresponding device (for DEVICE0 I get the SPOOLID 100 instead of 99). I cannot figure out what is wrong with my grok code. Maybe someone here can find the error in my code and fix it?


Solution

  • The point is that the %{...} groups are not actually working the same way as named capturing groups, you need to create an auxiliary named capturing group and refer to it using the named backreference using \k<name> syntax:

    (?m)%{DATA}>\s+(?<aux>\b%{DATA:device}\b)\s*START_THREAD%{DATA}\k<aux>(?<device>\s+SPOOLID\s+%{NUMBER:num}\s+PROCESSED)
    

    Note the (?<aux>\b%{DATA:device}\b) named group and \k<aux> backreference here.

    See more on the Oniguruma regex syntax here.