Search code examples
logstash-groklogstash-configuration

Parsing two formats of log messages in LogStash


In a single log file, there are two formats of log messages. First as so:

Apr 22, 2017 2:00:14 AM org.activebpel.rt.util.AeLoggerFactory info
INFO:
======================================================
ActiveVOS 9.* version Full license.
Licensed for All application server(s), for 8 cpus,
License expiration date: Never.
======================================================

and second:

Apr 22, 2017 2:00:14 AM org.activebpel.rt.AeException logWarning
WARNING: The product license does not include Socrates.

First line is same, but on the other lines, there can be (written in pseudo) :
loglevel: <msg>, or
loglevel:<newline><many of =><newline><multiple line msg><newline><many of =>

I have the following configuration:
Query:

%{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{GREEDYDATA:info}%{SPACE}%{LOGLEVEL:level}:(%{SPACE}%{GREEDYDATA:msg}|%{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+)

Grok patterns:

AMPM (am|AM|pm|PM|Am|Pm)
TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}

Multiline filter:

%{LOGLEVEL}|%{GREEDYDATA}|=+

The problem is that all messages are always identified with %{SPACE}%{GREEDYDATA:msg}, and so in second case return <many of => as msg, and never with %{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+, probably as first msg pattern contains the second.

How can I parse these two patterns of msg ?


Solution

  • I fixed it by following:
    Query:

    %{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{DATA:info}\s%{LOGLEVEL:level}:\s((=+\s%{GDS:msg}\s=+)|%{GDS:msg})
    

    Patterns:

    AMPM (am|AM|pm|PM|Am|Pm)
    TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}
    GDS (.|\s)*
    

    Multiline pattern:

    %{LOGLEVEL}|%{GREEDYDATA}
    

    Logs are correctly parsed.