logstash elastic-stack logstash-grok logstash-configuration

Optimize logstash configuration to reduce CPU utilization

Currently my filter looks like :

filter {
    grok {
        match => {
                "message" => '%{SYSLOG5424SD:logLevel}\t<%{JAVACLASS:Job} *>\t%{TIMESTAMP_ISO8601:Date}\t%{GREEDYDATA:kvmessage}'
        }
    }
    if ([message] !~ "LogstashEvent") {
         drop { }
     }
    grok {
        match => {
                "kvmessage" => '%{SYSLOG5424SD}%{GREEDYDATA:kvmess}'
        }
    }
    kv {
        source => "kvmess"
        field_split => "|"
    }
}

I am taking input from lot of files using the input filter. At its max it could be around 6000 files. But out of the huge number of logs generated, the logs I am interested in are just the one's with "LogstashEvent" in them - which will be even less than 1% of the total logs. I am guessing there has to be faster ways of doing this than what I am doing now. Right now CPU utilization of logstash is about 20% which is very high than what I expected it to be - considering my filter is not huge (although volume of input is)

Also I generated kvmessage above to use it in the second grok. I don't really need that finally. Is there anyway to drop mappings?

A dummy log illustrating the kind of logs I am trying to analyze here :

[INFO] 2016-06-28 17:20:49,308 [LogstashEvent]a=1|b=talkischeap|c=showmecode|d=24|e=0

Is it possible that the CPU utilization of logstash on each of my host is high because my elascticsearch cluster is not able to index it fast?

Solution

Move this:

if ([message] !~ "LogstashEvent") {
    drop { }
}

before the first grok filter. You'll save your first grok filter 99% of its previous work.

You can drop fields with the remove_field` option, which is present in all filters. (drop_field option)

And I don't think the indexing speed has a negative impact. When elasticsearch can't index fast enough for logstash, logstash will wait for it and stop reading the files.