Search code examples
c#elasticsearchnest

ElasticSearch Aggregation of Text or Integer


We have a process where our web services create log records in ElasticSearch (C#, using NEST). The ES index names include the month and year.

An aggregation program (C#, not using NEST) pulls near real-time information from the various logs. It consists of a date histogram, some terms (host, ip, etc), and the summation of some integer fields. It makes a request similar to this:

{
    "size":0,
    "query": {
        "range":{"date":{"gt":"2018-10-01T00:00:00","lte":"2018-10-01T01:00:00"}}
    },
    "aggs": {
        "myBuckets": {
            "composite": {
                "size":100,
                "sources": [
                    {"host":{"terms":{"field":"host.keyword","missing":""}}},
                    {"ipAddress":{"terms":{"field":"ipAddress.keyword","missing":""}}},
                    {"date":{"date_histogram":{"field":"date","interval":"1h"}}}
                ]
            },
            "aggregations": {
                "records":{"sum":{"field":"records","missing":0}}
            }
        }
    }
}

The problem lies in these integer fields, in that occasionally a rogue/buggy web service will use a string instead of an integer. This causes ES to change the index's mapping of the field (from integer to string), and breaks the aggregator.

Fixing the index through a re-index is not an option, we'd prefer to handle this on-the-fly if possible.

My current plan is to read the index's map and switch the summation aggregation to a painless script similar to this:

doc['badField.keyword'].value!=null ? Integer.parseInt(doc['badField.keyword'].value) : 0

Is there a better way to handle this situation? If not, is there a more robust way of scripting the integer conversion?


Solution

  • ... ES to change the index's mapping of the field

    ES will never change the mapping of a field once it's created. The only way this can happen is if the first record you send has a string value instead of an integer value.

    You can easily overcome this by using creating an index template before you index your first record:

    PUT _template/my-template
    {
      "index_patterns": ["my-index*"],
      "mappings": {
        "_doc": {
          "properties": {
            "my_integer_field": {
              "type": "integer"           <---- this will always be honored
              "ignore_malformed": true    <---- ignore if the value really isn't an integer
            }
          }
        }
      }
    }