Search code examples
open-telemetryopen-telemetry-collector

Preprocess (replace character) log body for syslog receiver


I use opentelemetry-collector (contrib 0.82.0) and syslogreceiver. I'm getting error parsed value was not rfc3164 or rfc5424 compliant for some logs:

{
  "level": "error",
  "ts": 1693341000.02045,
  "caller": "helper/transformer.go:99",
  "msg": "Failed to process entry",
  "kind": "receiver",
  "name": "syslog",
  "data_type": "logs",
  "operator_id": "syslog_input_internal_parser",
  "operator_type": "syslog_parser",
  "error": "parsed value was not rfc3164 or rfc5424 compliant",
  "action": "send",
  "entry": {
    "observed_timestamp": "2023-08-29T20:30:00.020397777Z",
    "timestamp": "0001-01-01T00:00:00Z",
    "body": "<7>Aug 29 16:30:00 dev app-logs: 2023-08-29 20:29:11,604 [app] ERROR error log\\n\tat backtrace",
    "severity": 0,
    "scope_name": ""
  },
  "stacktrace": "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/transformer.go:99\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/parser.go:140\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/parser.go:112\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/syslog.(*Parser).Process\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/parser/syslog/syslog.go:153\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/helper/writer.go:53\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/tcp.(*Input).handleMessage\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/tcp/tcp.go:321\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/tcp.(*Input).goHandleMessages.func1\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/tcp/tcp.go:282"
}

Root cause is that log body message contains non printable characters (e.g. tabs in Java backtrace), which is violation of RFC3164. Problem is that I can't configure sender to follow RFC3164.

Can I configure some operator in syslog receiver, which will preprocess body message before real syslog RFC3164 collector parser to avoid this problem? I would say simple replace of \t -> \\t will be a solution.

Of course processor are designed for data modification, but I can't use any processor here, because problem is on the receiver level.


Solution

  • You can receive logs using the tcp or udp receiver, preprocess using operators, and then pass the data to the syslog_parser operator.

    The following receiver configs are equivalent except that the second allows for preprocessing:

    receivers:
      syslog:
        tcp:
          listen_address: ...
        protocol: ...
      tcplog:
        listen_address: ...
        operators:
          # preprocess here
          - syslog_parser:
            protocol: ...
    

    Tou can use the add operator to reformat and overwrite the log's body. Like many other operators, the add operator supports an expression language which contains a replace function.

    receivers:
      tcplog:
        listen_address: ...
        operators:
          - type: add
            field: body
            value: EXPR(replace(body, '\t', '\\t'))
          - type: syslog_parser
            protocol: rfc3164