Search code examples
jsonelasticsearchlogstashlogstash-grok

Parse multiline JSON with grok in logstash


I've got a JSON of the format:

{
    "SOURCE":"Source A",
    "Model":"ModelABC",
    "Qty":"3"
}

I'm trying to parse this JSON using logstash. Basically I want the logstash output to be a list of key:value pairs that I can analyze using kibana. I thought this could be done out of the box. From a lot of reading, I understand I must use the grok plugin (I am still not sure what the json plugin is for). But I am unable to get an event with all the fields. I get multiple events (one even for each attribute of my JSON). Like so:

{
       "message" => "  \"SOURCE\": \"Source A\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.432Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
       "message" => "  \"Model\": \"ModelABC\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.438Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
       "message" => "  \"Qty\": \"3\",",
      "@version" => "1",
    "@timestamp" => "2014-08-31T01:26:23.438Z",
          "type" => "my-json",
          "tags" => [
        [0] "tag-json"
    ],
          "host" => "myserver.example.com",
          "path" => "/opt/mount/ELK/json/mytestjson.json"
}

Should I use the multiline codec or the json_lines codec? If so, how can I do that? Do I need to write my own grok pattern or is there something generic for JSONs that will give me ONE EVENT with key:value pairs that I get for one event above? I couldn't find any documentation that sheds light on this. Any help would be appreciated. My conf file is shown below:

input
{
        file
        {
                type => "my-json"
                path => ["/opt/mount/ELK/json/mytestjson.json"]
                codec => json
                tags => "tag-json"
        }
}

filter
{
   if [type] == "my-json"
   {
        date { locale => "en"  match => [ "RECEIVE-TIMESTAMP", "yyyy-mm-dd HH:mm:ss" ] }
   }
}

output
{
        elasticsearch
        {
                host => localhost
        }
        stdout { codec => rubydebug }
}

Solution

  • I think I found a working answer to my problem. I am not sure if it's a clean solution, but it helps parse multiline JSONs of the type above.

    input 
    {   
        file 
        {
            codec => multiline
            {
                pattern => '^\{'
                negate => true
                what => previous                
            }
            path => ["/opt/mount/ELK/json/*.json"]
            start_position => "beginning"
            sincedb_path => "/dev/null"
            exclude => "*.gz"
        }
    }
    
    filter 
    {
        mutate
        {
            replace => [ "message", "%{message}}" ]
            gsub => [ 'message','\n','']
        }
        if [message] =~ /^{.*}$/ 
        {
            json { source => message }
        }
    
    }
    
    output 
    { 
        stdout { codec => rubydebug }
    }
    

    My mutliline codec doesn't handle the last brace and therefore it doesn't appear as a JSON to json { source => message }. Hence the mutate filter:

    replace => [ "message", "%{message}}" ]
    

    That adds the missing brace. and the

    gsub => [ 'message','\n','']
    

    removes the \n characters that are introduced. At the end of it, I have a one-line JSON that can be read by json { source => message }

    If there's a cleaner/easier way to convert the original multi-line JSON to a one-line JSON, please do POST as I feel the above isn't too clean.