Search code examples
typesmappinglogstashkibanagrok

can't force GROK parser to enforce integer/float types on haproxy logs


Doesn't matter if integer/long or float, fields like time_duration (all time_* really ) map as strings in kibana logstash index.

I tried using mutate (https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data) did not work either.

How can i correctly enforce numeric type instead of strings on these fields?

My /etc/logstash/conf.d/haproxy.conf:

input {
  syslog {
    type => haproxy
    port => 5515
  }
}
filter {
  if [type] == "haproxy" { 
    grok {
      patterns_dir => "/usr/local/etc/logstash/patterns"
      match => ["message", "%{HAPROXYHTTP}"]
      named_captures_only => true
    }
    geoip {
      source => "client_ip"
      target => "geoip"
      database => "/etc/logstash/GeoLiteCity.dat"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
  }
}

And my pattern for HAPROXYHTTP:

HAPROXYHTTP  %{IP:client_ip}:%{INT:client_port} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request:int}/%{INT:time_queue:int}/%{INT:time_backend_connect:int}/%{INT:time_backend_response:int}/%{NOTSPACE:time_duration:int} %{INT:http_status_code} %{NOTSPACE:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn:int}/%{INT:feconn:int}/%{INT:beconn:int}/%{INT:srvconn:int}/%{NOTSPACE:retries:int} %{INT:srv_queue:int}/%{INT:backend_queue:int} (\{%{HAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )?"(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"

Solution

  • It's quite possible that Logstash is doing the right thing here (your configuration looks correct), but how Elasticsearch maps the fields is another matter. If a field in an Elasticsearch document at some point has been dynamically mapped as a string, subsequent documents added to the same index will also be mapped as strings even though they're integers or floating point numbers in the source document. To change this you have to reindex, but with timeseries-based Logstash indexes you can just wait until the next day when you get a new index.