I've a log file (http://codepad.org/vAMFhhR2), and I want to extract a specific number out of it (line 18) I wrote a custom pattern grok filter, tested it on http://grokdebug.herokuapp.com/, it works fine and extracts my desired value.
here's how logstash.conf looks like:
input {
tcp {
port => 5000
}
}
filter {
grok{
match => [ "message", "(?<scraped>(?<='item_scraped_count': ).*(?=,))" ]
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
but it doesn't match any record from the same log on Kibana
Thoughts?
Your regexp may be valid but the lookahead and lookbehind ("?=" and "?<=") are not a good choice in this context. Instead you could use a much simpler filter:
match => [ "message", "'item_scraped_count': %{NUMBER:scraped}" ]
This will extract the number after 'item_scraped_count':
as a field called scraped
, using the 'NUMBER' Grok built-in pattern.
Result in Kibana:
{
"_index": "logstash-2017.04.11",
"_type": "logs",
"_source": {
"@timestamp": "2017-04-11T20:02:13.194Z",
"scraped": "22",
(...)
}
}
If I may suggest another improvement: since your message is spread across multiple lines you could easily merge it using the multiline
input codec:
input {
tcp {
port => 5000
codec => multiline {
pattern => "^(\s|{')"
what => "previous"
}
}
}
This will merge all the lines starting with either a whitespace or {'
with the previous one.