Search code examples
regexrubyfluentd

Multi-Line Log File Parsing with Ruby Regex in FluentD


I've got a log that has lines like this:

6/10/2022 10:06:16.908 | INFO | CLASS | BlankStart,15,1,2

But sometimes, the log msg is a long json blob over multiple lines.

Example bag log line:

6/10/2022 10:06:16.908 | INFO | CLASS | Obj: { "test": false,
"reso": true }

Full example w/ 3 matches:

6/10/2022 10:06:16.908 | INFO | CLASS | BlankStart,15,1,2
6/10/2022 10:06:16.908 | INFO | CLASS | Obj: { "test": false,
"reso": true }
6/10/2022 10:06:16.908 | INFO | CLASS | BlankStart,15,1,2

Here is my Regex as it stands, which I added 'strict' date checking for new lines to, for now, just get the end of the msg and ignore the multi lines.

(?<time>^\d{1,2}\/\d{1,2}\/\d{4}\s\d{2}:\d{2}:\d{2}.\d+)...(?<type>.[^| ]*)...(?<class>.[^| ]*)..(?<msg>.*)

In the fluent docs they talk about using \m, but I can't understand how to use this in the Regex correctly.

https://docs.fluentd.org/parser/regexp#multiline

enter image description here


Solution

  • Switching from regex to multiline in the parse section made things much better. I was able to define the "start of a log format" and it understood to continue the last log for each line until a match was received for a new "start of a log".

    <parse>
      @type multiline
      format_firstline /\d{1,2}\/\d{1,2}\/\d{4}\s\d{2}:\d{2}:\d{2}.\d+/  
      format1 /^(?<time>\d{1,2}\/\d{1,2}\/\d{4}\s\d{2}:\d{2}:\d{2}.\d+)...(?<type>.[^| ]*)...(?<class>.[^| ]*)..(?<msg>.*)/
    </parse>