Search code examples
regexgrok

Need to parse a log and create 2 controlled groups where one group has text of the other group


I was wondering if someone can help me with a parsing problem. I've been working on parsing a particular log where I'm using controlled groups (Description, FooBar, etc.). Parsing this log has been a big challenge.

The log file looks like this:

2021-02-10T09:0022.041-05:00 | Info | TransactionGUID=yyyy1234-12a1-1a99-1234-01ab1ab12abc | TransactionID=123456 | Saving uploaded file to shared folder \\foobar\foo\fil\ENV1\ABMylocingZone\TIMS\FileTemplates\12345678_12345678_01ab1ab12abc-99f5-4a43-9127-01ab1ab12abc.xlsx | CopyToSharedFolder()

I need to place this set of text:

Saving uploaded file to shared folder \foobar\foo\fil\ENV1\ABMylocingZone\TIMS\FileTemplates\12345678_12345678_01ab1ab12abc-99f5-4a43-9127-01ab1ab12abc.xlsx | CopyToSharedFolder()

into the "Description" capture group.

I need to place this set of text:

12345678

in the "FooBar" capture group.

Below is what I was able to come up with thus far. If I try to add the FooBar control group (omitted from below rule), I lose part of the Description controlled group. Because of the application I'm working with, I need to use the GROK Debugger to create and debug my rule:

[A-Za-z0-9]{0,7}%{SPACE}%{TIMESTAMP_ISO8601:dateTime}%{SPACE}\|%{SPACE}%{LOGLEVEL:Loglevel}%{SPACE}\|%{SPACE}TransactionGUID=%{UUID:GUID}%{SPACE}\|%{SPACE}TransactionID=%{NUMBER:TransactionId}%{SPACE}\|%{SPACE}(?<Description>(?<=\|\s).*(?=\)?))

Solution

  • Short version:

    This message...

    MyGroup12345679ContainsInfo
    

    ... gets captured by the message group, and has the number it contains captured by the hidden_message group.

    (?<message>[a-zA-Z]+(?<hidden_message>%{NUMBER})[a-zA-Z]+)
    

    Complete version:

    As for your exact log, I would parse it this way : (had to replace UUID with NUMBER for testing purpose)

    grok {
        message => [
            "%{TIMESTAMP_ISO8601:dateTime}%{SPACE}\|%{SPACE}%{LOGLEVEL:Loglevel}%{SPACE}\|%{SPACE}TransactionGUID=%{NUMBER:GUID}%{SPACE}\|%{SPACE}TransactionID=%{NUMBER:TransactionId}%{SPACE}\|%{SPACE}(?<Description>.*(\\(?<FooBar>[0-9]+)_[^\\]+\.[a-zA-Z0-9]+).*)",
             "+ the pattern you are using now, unless there is always a path to match there"
        ]
    }
    

    Tested log:

    2021-02-10T09:0022.041-05:00 | Info | TransactionGUID=82147 | TransactionID=123456 | Saving uploaded file to shared folder \\foobar\foo\fil\ENV1\ABMylocingZone\TIMS\FileTemplates\12345678_12345678_01ab1ab12abc-99f5-4a43-9127-01ab1ab12abc.xlsx | CopyToSharedFolder()
    

    The description part explained :

    .*      # greedily consumes characters 
    (                           # matches a filename beginning with a number :
      \\(?<FooBar>[0-9]+)      # one "\", a number,
      _[^\\]+                     # one _, anything but a "\" once or more
      \.[a-zA-Z0-9]+              # a file extension
    )
    .*       # the rest of it