Search code examples
regexlogstashelastic-stacklogstash-grok

Logstash grok filter custom pattern is not working


I've a log file (http://codepad.org/vAMFhhR2), and I want to extract a specific number out of it (line 18) I wrote a custom pattern grok filter, tested it on http://grokdebug.herokuapp.com/, it works fine and extracts my desired value.

here's how logstash.conf looks like:

input {
    tcp {
        port => 5000
    }
}

filter {
    grok{
         match => [ "message", "(?<scraped>(?<='item_scraped_count': ).*(?=,))" ]
    }
}

output {
    elasticsearch {
        hosts => "elasticsearch:9200"
    }
}

but it doesn't match any record from the same log on Kibana

Thoughts?


Solution

  • Your regexp may be valid but the lookahead and lookbehind ("?=" and "?<=") are not a good choice in this context. Instead you could use a much simpler filter:

    match => [ "message", "'item_scraped_count': %{NUMBER:scraped}" ]
    

    This will extract the number after 'item_scraped_count': as a field called scraped, using the 'NUMBER' Grok built-in pattern.

    Result in Kibana:

    {
      "_index": "logstash-2017.04.11",
      "_type": "logs",
      "_source": {
        "@timestamp": "2017-04-11T20:02:13.194Z",
        "scraped": "22",
        (...)
      }
    }
    

    If I may suggest another improvement: since your message is spread across multiple lines you could easily merge it using the multiline input codec:

    input {
        tcp {
            port => 5000
            codec => multiline {
                pattern => "^(\s|{')"
                what => "previous"
            }
        }
    }
    

    This will merge all the lines starting with either a whitespace or {' with the previous one.