Search code examples
elasticsearchmappingelasticsearch-analyzers

Elasticsearch mapping: How to analyze or map to numeric fields?


I want to index the month field of a bibtex entry into elasticsearch and make it searchable via the range query. This requires the underlying field type to be some kind of numeric datatype. In my case short would be sufficient.

The bibtex month field in its canonical form requires a three character abbreviation, so I tried to use the char_filter like so:

...
"char_filter": {
    "month_char_filter": {
        "type": "mapping",
        "mappings": [
            "jan => 1",
            "feb => 2",
            "mar => 3",
            ...
            "nov => 11",
            "dec => 12"
        ]
    }
...
"normalizer": {
    "month_normalizer": {
        "type": "custom",
        "char_filter": [ "month_char_filter" ],
    },

And put up mappings like this:

...
"month": {
    "type": "short",
    "normalizer": "month_normalizer"
},
...

But it doesn't seem to work since the type field doesn't support normalizers like this, as well as it doesn't support analyzers.

So what would be the approach to implement such a mapping as shown in the char_filter part so there are range query possibilites?


Solution

  • Your approach intuitively makes sense, however, normalizers can only be applied to keyword fields and analyzers to text fields.

    Another approach would be to leverage the ingest processors and use the script processor to do that mapping at indexing time.

    Below you can find a simulation of such a script processor that would create a new field called monthNum based on the month present in the month field.

    POST _ingest/pipeline/_simulate
    {
      "pipeline": {
        "processors": [
          {
            "script": {
              "source": """
              def mapping = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'];
              ctx.monthNum = mapping.indexOf(ctx.month) + 1;
              """
            }
          }
        ]
      },
      "docs": [
        {
          "_source": {
            "month": "feb"
          }
        },
        {
          "_source": {
            "month": "mar"
          }
        },
        {
          "_source": {
            "month": "jul"
          }
        },
        {
          "_source": {
            "month": "aug"
          }
        },
        {
          "_source": {
            "month": "nov"
          }
        },
        {
          "_source": {
            "month": "dec"
          }
        },
        {
          "_source": {
            "month": "xyz"
          }
        }
      ]
    }
    

    Resulting documents:

    {
      "docs" : [
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 2,
              "month" : "feb"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 3,
              "month" : "mar"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 7,
              "month" : "jul"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 8,
              "month" : "aug"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 11,
              "month" : "nov"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 12,
              "month" : "dec"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        },
        {
          "doc" : {
            "_index" : "_index",
            "_type" : "_type",
            "_id" : "_id",
            "_source" : {
              "monthNum" : 0,
              "month" : "xyz"
            },
            "_ingest" : {
              "timestamp" : "2019-05-08T12:28:27.006Z"
            }
          }
        }
      ]
    }