Search code examples
regexelasticsearchkubernetesfluentd

Fluentd Regular Expression Matching Error


I am trying to parse the logs from kubernetes like this for example

2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

And this is the configuration

<source>
  @id calico-node.log
  @type tail
  format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
  time_format %Y-%m-%d %H:%M:%S
  path /var/log/containers/calico-node**.log
  pos_file /var/log/es-calico.pos
  tag calico-node
</source>

According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this

2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```

What could be wrong? I have had similar errors with the built-in parser for apache logs as well?


Solution

  • From what I can see, you are missing something in the fluentd config.

    Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.

    It should be as follows: time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L