Search code examples
regexloggingfluentd

Regex Pattern for a Java Log


I am trying to use the regex Parser Plugin in fluentd to index the logs of my application.

Here's a snippet of it.

2020-05-06T22:34:50.860-0700 - WARN [main] o.s.b.GenericTypeAwarePropertyDescriptor: Invalid JavaBean property 'pipeline' being accessed! Ambiguous write methods found next to actually used [public void com.theoaal.module.pipeline.mbean.DynamicPhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theplatform.module.pipeline.DynamicPipeline)]: [public void com.theplatform.module.pipeline.mbean.PhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theoaal.module.pipeline.Pipeline)]

I have used the regex101.com to match the regex pattern and I am not able to get a match.

^(?<date>\d{4}\-\d{2}\-\d{2})(?<timestamp>[A-Z][a-z]{1}\d{2}:\d{2}:\d{2}.\d{3}\-\d{4})\s\-\s(?<loglevel>\[\w\]{6})\s+(?<class>\[[A-Z][a-z]+\])\s(?<message>.*)$

Kindly help. Thanks


Solution

  • You may use

    ^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)
    

    See the regex demo

    Note, in your pattern, \[\w\]{6} only matches [, a single word char and six ] chars. In the timestamp pattern, [A-Z][a-z]{1} requires two letters, but tere is a single T. Your "class" pattern requires a capitalized word with [A-Z][a-z]+, but main is all lowercase. You escape - outside of character classes unnecessarily, and you failed to escape a literal dot in the pattern.

    Details

    • ^ - start of string
    • (?<date>\d{4}-\d{2}-\d{2}) - date: 4 digits, -, 2 digits, -, 2 digits
    • [A-Z] - an uppercase ASCII letter
    • (?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4}) - 2 digits, :, 2 digits, :, 2 digits, ., 3 digits, - and 4 digits
    • \s+-\s+ - - enclosed with 1+ whitespaces
    • (?<loglevel>\w+) - 1+ word chars
    • \s+ - 1+ whitespaces
    • (?<class>\[\w+\]) - [, 1+ word chars, ]
    • \s+ - 1+ whitespaces
    • (?<message>.*) - the res of the line.

    Copy and paste to fluent.conf or td-agent.conf:

    <source>
      type tail
      path /var/log/foo/bar.log
      pos_file /var/log/td-agent/foo-bar.log.pos
      tag foo.bar
      format /^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)/
    </source>
    

    Test:

    enter image description here