Preprocessing a message containing multiple log records

TL;DR. Is it possible to preprocess a message by splitting on the newlines, and then have each message go through the fluentd pipeline as usually?

I'm receiving these log messages in fluentd:

2018-09-13 13:00:41.251048191 +0000 : {"message":"146 <190>1 2018-09-13T13:00:40.685591+00:00 host app web.1 - 13:00:40.685 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Received GET /alerts\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"199 <190>1 2018-09-13T13:00:40.872670+00:00 host app web.1 - 13:00:40.871 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Processing with Api.AlertController.index/2 Pipelines: [:api]\n156 <190>1 2018-09-13T13:00:40.898316+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Rendered \"index.json\" in 1.0ms\n155 <190>1 2018-09-13T13:00:40.898415+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Sent 200 response in 209.70ms\n"}

The problem with these logs is that second message: it contains multiple application log lines.

This is, unfortunately, what I have to deal with: the system (hello, Heroku logs!)I'm working with buffers logs and the spits them out as a single chunk, making it impossible to know the number of records in the chunk upfront.

This is known property of Heroku log draining.

Is there a way to preprocess the log message, so that I get a flat stream of messages to be processed normally by subsequent fluentd facilities?

This is how the post-processed stream of messages should look like:

2018-09-13 13:00:41.251048191 +0000 : {"message":"146 <190>1 2018-09-13T13:00:40.685591+00:00 host app web.1 - 13:00:40.685 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Received GET /alerts\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"199 <190>1 2018-09-13T13:00:40.872670+00:00 host app web.1 - 13:00:40.871 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Processing with Api.AlertController.index/2 Pipelines: [:api]\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"156 <190>1 2018-09-13T13:00:40.898316+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Rendered \"index.json\" in 1.0ms\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"155 <190>1 2018-09-13T13:00:40.898415+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Sent 200 response in 209.70ms\n"}

P.S. My current config is super basic, but I'm posting it just in case. All I'm trying to do is to understand if it's possible, in principle, preprocess the message?

<source>
  @type http
  port 5140
  bind 0.0.0.0

  <parse>
    @type none
  </parse>
</source>

<filter **>
  @type stdout
</filter>

Solution

How about https://github.com/hakobera/fluent-plugin-heroku-syslog ?

fluent-plugin-heroku-syslog has been unmaintained since 4 years ago, but it will work with Fluentd v1 using compatible layer.