Search code examples
logstashelastic-stacklogstash-grok

grok filter fails for ISO8601 timestamps since 5.2


since I've upgraded our ELK-stack from 5.0.2 to 5.2 our grok filters fail and I've no idea why. Maybe I've overlooked something in the changelogs?

Filter

filter {
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
      add_field => [ "received_at", "%{@timestamp}" ]
    }
    mutate {
      convert => {
        "proxy_response_time" => "float"
        "request_time" => "float"
        "body_bytes_sent" => "integer"
      }
    }
  }
}

Error

Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\"

Full Error

[2017-02-05T15:55:49,500][WARN ][logstash.outputs.elasticsearch] Failed action. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"filebeat-2017.02.05", :_type=>"nginx_access", :_routing=>nil}, 2017-02-05T14:55:38.000Z proxy2 4.3.2.1 - - [2017-02-05T15:55:38+01:00] "HEAD / HTTP/1.1" 200 0 "-" "Zabbix" "example.com" "host1:10040" "1.2.3.4:10040" "MISS" [0.095] [0.095]], :response=>{"index"=>{"_index"=>"filebeat-2017.02.05", "_type"=>"nginx_access", "_id"=>"AVoOxh7p5p68dsalXDFX", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\""}}}}}

The whole thing works perfectly on http://grokconstructor.appspot.com and the TIMESTAMP_ISO8601 still seems the right choice (https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns)

Techstack

  • Ubuntu 16.04
  • Elasticsearch 5.2.0
  • Logstash 5.2.0
  • Filebeat 5.2.0
  • Kibana 5.2.0

Any idas?

Cheers, Finn

UPDATE

So this version works for some reason

filter {
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
      add_field => [ "received_at", "%{@timestamp}" ]
    }
    date {
        match => [ "timestamp" , "yyyy-MM-dd'T'HH:mm:ssZ" ]
        target => "timestamp"
    }
    mutate {
      convert => {
        "proxy_response_time" => "float"
        "request_time" => "float"
        "body_bytes_sent" => "integer"
      }
    }
  }
}

If someone can shed some light why I have to redefine a valid ISO8601 date I would be happy to know.


Solution

  • Make sure you specify the format of timestamp you are expecting in your documents, where the mapping could look like:

    PUT index
    {
      "mappings": {
        "your_index_type": {
          "properties": {
            "date": {
              "type":   "date",
              "format": "yyyy-MM-ddTHH:mm:ss+01:SS" <-- make sure to give the correct one
            }
          }
        }
      }
    }
    

    If you do not specify it correctly, Elasticsearch will expect the timestamp value in format of ISO. OR you could do a date match for your timestamp field, which could look something like this within your filter:

    date {
        match => [ "timestamp" , "yyyy-MM-ddTHH:mm:ss+01:SS" ] <--match the timestamp (I'm not sure what +01:ss stands for, make sure it matches)
        target => "timestamp"
        locale => "en"
        timezone => "UTC"
    }
    

    Or you could add a new field and match that to the timestamp if you wish, and then you could remove it if you aren't really using it, since you have the timestamp on the new field. Hope it helps.