Currently my filter looks like :
filter {
grok {
match => {
"message" => '%{SYSLOG5424SD:logLevel}\t<%{JAVACLASS:Job} *>\t%{TIMESTAMP_ISO8601:Date}\t%{GREEDYDATA:kvmessage}'
}
}
if ([message] !~ "LogstashEvent") {
drop { }
}
grok {
match => {
"kvmessage" => '%{SYSLOG5424SD}%{GREEDYDATA:kvmess}'
}
}
kv {
source => "kvmess"
field_split => "|"
}
}
I am taking input from lot of files using the input filter. At its max it could be around 6000 files. But out of the huge number of logs generated, the logs I am interested in are just the one's with "LogstashEvent" in them - which will be even less than 1% of the total logs. I am guessing there has to be faster ways of doing this than what I am doing now. Right now CPU utilization of logstash is about 20% which is very high than what I expected it to be - considering my filter is not huge (although volume of input is)
Also I generated kvmessage
above to use it in the second grok. I don't really need that finally. Is there anyway to drop mappings?
A dummy log illustrating the kind of logs I am trying to analyze here :
[INFO] 2016-06-28 17:20:49,308 [LogstashEvent]a=1|b=talkischeap|c=showmecode|d=24|e=0
Is it possible that the CPU utilization of logstash on each of my host is high because my elascticsearch cluster is not able to index it fast?
Move this:
if ([message] !~ "LogstashEvent") {
drop { }
}
before the first grok filter. You'll save your first grok filter 99% of its previous work.
You can drop fields with the remove_field` option, which is present in all filters. (drop_field option)
And I don't think the indexing speed has a negative impact. When elasticsearch can't index fast enough for logstash, logstash will wait for it and stop reading the files.