Search code examples
logstashelastic-stacklogstash-grok

logstash grok, parse a line with json filter


I am using ELK(elastic search, kibana, logstash, filebeat) to collect logs. I have a log file with following lines, every line has a json, my target is to using Logstash Grok to take out of key/value pair in the json and forward it to elastic search.

2018-03-28 13:23:01  charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}

2018-03-28 13:23:01  manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}

I am using Grok Debugger to make regex pattern and see the result. My current regex is:

%{TIMESTAMP_ISO8601} %{SPACE} %{WORD:$:data}:{%{QUOTEDSTRING:key1}:%{BASE10NUM:value1}[,}]%{QUOTEDSTRING:key2}:%{BASE10NUM:value2}[,}]%{QUOTEDSTRING:key3}:%{QUOTEDSTRING:value3}[,}]%{QUOTEDSTRING:key4}:%{QUOTEDSTRING:value4}[,}]%{QUOTEDSTRING:key5}:%{BASE10NUM:value5}[,}]

As one could see it is hard coded since the keys in json in real log could be any word, the value could be integer, double or string, what's more, the length of the keys varies. so my solution is not acceptable. My solution result is shown as follows, just for reference. I am using Grok patterns.

My question is that trying to extract keys in json is wise or not since elastic search use json also? Second, if I try to take keys/values out of json, are there correct,concise Grok patterns?

current result of Grok patterns give following output when parsing first line in above lines.

{
  "TIMESTAMP_ISO8601": [
    [
      "2018-03-28 13:23:01"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "28"
    ]
  ],
  "HOUR": [
    [
      "13",
      null
    ]
  ],
  "MINUTE": [
    [
      "23",
      null
    ]
  ],
  "SECOND": [
    [
      "01"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "SPACE": [
    [
      ""
    ]
  ],
  "WORD": [
    [
      "charge"
    ]
  ],
  "key1": [
    [
      ""oldbalance""
    ]
  ],
  "value1": [
    [
      "5000"
    ]
  ],
  "key2": [
    [
      ""managefee""
    ]
  ],
  "value2": [
    [
      "0"
    ]
  ],
  "key3": [
    [
      ""afterbalance""
    ]
  ],
  "value3": [
    [
      ""5001""
    ]
  ],
  "key4": [
    [
      ""cardid""
    ]
  ],
  "value4": [
    [
      ""123456789""
    ]
  ],
  "key5": [
    [
      ""txamt""
    ]
  ],
  "value5": [
    [
      "1"
    ]
  ]
}

second edit

Is it possible to use Json filter of Logstash? but in my case Json is part of line/event, not whole event is Json.

===========================================================

Third edition

I do not see updated solution functions well to parse json. My regex is as follows:

filter {
  grok {
    match => {
      "message" => [
           "%{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}"
            ]
    }       
  }
}


filter {
  json{
    source => "json_data"
    target => "parsed_json"
  } 
}

It does not have key:value pair, instead it is msg+json string. The parsed json is not parsed.

Testing data is as below:

2018-03-28 13:23:01  manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}
2018-03-28 13:23:03  payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}
2018-03-28 13:24:07  management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}

[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"manage:{\"cuurentValue\":5000,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'manage': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 8]>}
[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"payment:{\"cuurentValue\":5001,\"reload\":0,\"newbalance\":\"5002\",\"posid\":\"987654321\",\"something\":\"new3\",\"additionalFields\":2}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'payment': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}"; line: 1, column: 9]>}
[2018-06-04T15:01:34,986][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"management:{\"cuurentValue\":5002,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'management': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 12]>}

Please check the result: enter image description here


Solution

  • You can use GREEDYDATA to assign entire block of json to a separate field like this,

    %{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}
    

    This will create a separate file for your json data,

    {
      "TIMESTAMP_ISO8601": [
        [
          "2018-03-28 13:23:01"
        ]
      ],
      "json_data": [
        [
          "charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}"
        ]
      ]
    }
    

    Then apply a json filter on json_data field as follows,

    json{
        source => "json_data"
        target => "parsed_json"
    }