Search code examples
dockerlogstashmultilinelogstash-configuration

Docker syslog driver with multiline parsing in logstash


I am forwarding my docker logs via the syslog drivers to logstash. This works great for normal log lines, but having issues with multilines. The issue I am running into is that the docker log forwarding adds the syslog message format to each log line. If I use the logstash filter multiline (which logstash does not recommend), I can successfully combine the multilines and remove the syslog messages on the additional lines...however, this is not thread safe. I cannot get the logic to work via an input codec which is what logstash recommends.

So for example:

Docker command:

docker run --rm -it \
      --log-driver syslog \
      --log-opt syslog-address=tcp://localhost:15008 \
      helloWorld:latest

Logs in docker container:

Log message A
<<ML>> Log message B
  more B1
  more B2
  more B3
Log message C

Logs as received into logstash

<30>Jul 13 16:04:36 [1290]: Log message A
<30>Jul 13 16:04:37 [1290]: <<ML>> Log message B
<30>Jul 13 16:04:38 [1290]:  more B1
<30>Jul 13 16:04:39 [1290]:  more B2
<30>Jul 13 16:04:40 [1290]:  more B3
<30>Jul 13 16:04:41 [1290]:Log message C

Now I can get everything to parse as I want using the following filter:

logstash filter multiline

input { 
  tcp {
   port => 15008
   type => "multiline"
 }
}

filter {
  if ( [type] == "multiline") {
    grok {
      match => { "message" => [
        "^<(?<ignore>\d*)>(?<syslogDateTime>[\S]*)\s\[(?<pid>\d*)\]:.(?<newMessage>[\s\S]*)"
      ]}
    }

    multiline {
      pattern => "^[\s\S]*\<\<[M][L]\>\>"
      negate => true
      what => "previous"
      source => "newMessage"
      stream_identity => "%{host}.%{pid}"
    }
}

This is exactly what I want in my logstash messages

output

message: Log message A
message: <<ML>> Log message B more B1 more B2 more B3
message: Log message C

However, that runs for a few minutes...but then hangs and stops processing

Trying to get it to work via the codec multiline which is logstash recommendation

logstash codec multiline

 input { 
      tcp {
       port => 15008
       type => "multiline"
       codec => multiline {
         pattern => "^[\s\S]*\<\<[M][L]\>\>"
         negate => true
         what => "previous"
       }
     }
    }

    filter {
      if ( [type] == "multiline") {
        grok {
          match => { "message" => [
            "^<(?<ignore>\d*)>(?<syslogDateTime>[\S]*)\s\[(?<pid>\d*)\]:.(?<newMessage>[\s\S]*)"
          ]}
        }
    }

It combines the multilines correctly, but I now get those syslog messages mixed into my multiline messages

output

message: Log message A
message: <<ML>> Log message B <30>Jul 13 16:04:38 [1290]: more B1 <30>Jul 13 16:04:39 [1290]: more B2 <30>Jul 13 16:04:40 [1290]: more B3
message: Log message C

How to get the codec processing to output like the filter one?


Solution

  • Ok, I got this to work by using the logstash codec multiline with adding another filter after the grok match

        mutate {
          gsub => [
            "message", "<\d*>[\s\S]*?\[\d*\]:.", " "
          ]
        }