Search code examples
elasticsearchfilteranalyzersynonym

Elastic synonym usage in aggregations


Situation :

Elastic version used: 2.3.1

I have an elastic index configured like so

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ 
            "british,english",
            "queen,monarch"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter" 
          ]
        }
      }
    }
  }
}

Which is great, when I query the document and use a query term "english" or "queen" I get all documents matching british and monarch. When I use a synonym term in filter aggregation it doesnt work. For example

In my index I have 5 documents, 3 of them have monarch, 2 of them have queen

POST /my_index/_search
{
  "size": 0,
  "query" : {
      "match" : {
         "status.synonym":{
            "query": "queen",
            "operator": "and"
         }
      }
   },
     "aggs" : {
        "status_terms" : {
            "terms" : { "field" : "status.synonym" }
        },
        "monarch_filter" : {
            "filter" : { "term": { "status.synonym": "monarch" } }
        }
    },
   "explain" : 0
}

The result produces:

Total hits:

  • 5 doc count (as expected, great!)
  • Status terms: 5 doc count for queen (as expected, great!)
  • Monarch filter: 0 doc count

I have tried different synonym filter configuration:

  • queen,monarch
  • queen,monarch => queen
  • queen,monarch => queen,monarch

But the above hasn't changed the results. I was wanting to conclude that maybe you can use filters at query time only but then if terms aggregation is working why shouldn't filter, hence I think its my synonym filter configuration that is wrong. A more extensive synonym filter example can be found here.

QUESTION:

How to use/configure synonyms in filter aggregation?

Example to replicate the case above: 1. Create and configure index:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "wlh,wellhead=>wellwell"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  }
}

PUT my_index/_mapping/job
{
  "properties": {
    "title":{
      "type": "string",
      "analyzer": "my_synonyms"
    }
  }
}

2.Put two documents:

PUT my_index/job/1
{
    "title":"wellhead smth else"
}

PUT my_index/job/2
{
    "title":"wlh other stuff"
}

3.Execute a search on wlh which should return 2 documents; have a terms aggregation which should have 2 documents for wellwell and a filter which shouldn't have 0 count:

POST my_index/_search
{
  "size": 0,
  "query" : {
      "match" : {
         "title":{
            "query": "wlh",
            "operator": "and"
         }
      }
   },
     "aggs" : {
        "wlhAggs" : {
            "terms" : { "field" : "title" }
        },
        "wlhFilter" : {
            "filter" : { "term": { "title": "wlh"     } }
        }
    },
   "explain" : 0
}

The results of this query is:

   {
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "wlhAggs": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "wellwell",
               "doc_count": 2
            },
            {
               "key": "else",
               "doc_count": 1
            },
            {
               "key": "other",
               "doc_count": 1
            },
            {
               "key": "smth",
               "doc_count": 1
            },
            {
               "key": "stuff",
               "doc_count": 1
            }
         ]
      },
      "wlhFilter": {
         "doc_count": 0
      }
   }
}

And thats my problem, the wlhFilter should have at least 1 doc count in it.


Solution

  • So with the help of @Byron Voorbach below and his comments this is my solution:

    • I have created a separate field which I use synonym analyser on, as opposed to having a property field (mainfield.property).
    • And most importantly the problem was my synonyms were contracted! I had, for example, british,english => uk. Changing that to british,english,uk solved my issue and the filter aggregation is returning the right number of documents.

    Hope this helps someone, or at least point to the right direction.

    Edit: Oh lord praise the documentation! I completely fixed my issue with Filters (S!) aggregation (link here). In filters configuration I specified Match type of query and it worked! Ended up with something like this:

    "aggs" : {
        "messages" : {
          "filters" : {
            "filters" : {
              "status" :   { "match" : { "cats.saurus" : "monarch"   }},
              "country" : { "match" : { "cats.saurus" : "british" }}
            }
          }
        }
      }