Search code examples
logginglogstashkibanalogstash-grok

Grok conditional parsing for ELK Stack


I have this kind of log:

2020-09-02 14:29:22,854 [http-something] [ERROR] JavaClass(JavaLine) - [6652942]: Error message with no stack trace
2020-09-02 14:29:08,976 [http-something] [INFO] JavaClass(JavaLine) - [6791732]: Some message
2020-09-02 14:29:09,116 [http-something] [ERROR] JavaClass(JavaLine) - [6791732]: Error message with stack trace
JavaException: This is not going well
    at JavaClass
    at JavaClass
    at JavaClass
    at JavaClass
    at JavaClass
Caused by: JavaClass: This is a problem
    at JavaClass
    at JavaClass
    at JavaClass
    at JavaClass
    ... 48 more

and I use this filter to have a more readable log on Kibana:

filter {

    # INFO and ERROR
    grok {
        tag_on_failure => ["_stackTraceFailure"]
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}(\[%{DATA:thread}\])?%{SPACE}\[%{LOGLEVEL:log_level}\]%{SPACE}%{GREEDYDATA}%{SPACE}\-%{SPACE}%{GREEDYDATA:action}" }
        overwrite => [ "message" ]
    }

    # JAVA ERROR
    if ("_stackTraceFailure" in [tags]) {
        grok { 
            tag_on_failure => ["_grokParseFailure"]
            match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}(\[%{DATA:thread}\])?%{SPACE}\[%{LOGLEVEL:log_level}\]%{SPACE}%{GREEDYDATA}%{SPACE}\-%{SPACE}%{DATA:issue}(\r|\n)+(?m)%{GREEDYDATA:stack-trace}" }
            overwrite => [ "message" ]
            remove_tag => "_stackTraceFailure"
        }
    }
}

The problem is that the first pattern is matching everything, putting all the stack trace (when there is one) in the action tag and resulting in the second pattern to never be used. I know this problem is caused by GREEDYDATA but I'm not really skilled with regex and I'm not finding a solution to do what I want.

I don't want to swap the position of the patterns because INFO and ERROR (without stack trace) are way more common so I need a way to make the first one fail in the case of a multiline log or anything that will make the first one to fail if there is some sort of stack trace. Ho can I do that starting from what I have done so far?


Solution

  • You need to use conditionals before your groks. You can use a conditional to filter the entire message and use your two different grok filters, or you can keep your first grok filter as the same and use a conditional to parse only the action field, I would suggest the second option.

    In both cases you need your conditional to filter based on something that only exists in your multiline message, in this case could be the "at JavaClass" string.

    So you would need something like this:

    if "at JavaClass" not in [message] {
      grok { your first grok }
    } else {
        grok { your second grok }
    }
    

    If you want to keep your first grok and use a second one to parse only the action field, it would be something like this.

    if "at JavaClass" in [action] {
        grok {
            tag_on_failure => ["_grokParseFailure"]
            match => { "action" => "%{DATA:issue}(\r|\n)+(?m)%{GREEDYDATA:stack-trace}" }
        }
    }
    

    You didn't say how you are collecting your logs, if you are using filebeat or logstash with the multiline coded in the input, you also could filter based on the tags, since you would have a tag named multiline for your logs.