elasticsearch elasticsearch-plugin elasticsearch-analyzers

elasticsearch synonyms analyzer gives 0 results

I am using elasticsearch 7.0.0.

I am trying to work on synonyms with this configuration while creating index.

{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "address.state": {
        "type": "text",
        "analyzer": "synonym"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

Here's a document inserted into the index:

{
  "name": "Berry's Burritos",
  "description": "Best burritos in New York",
  "address": {
    "street": "230 W 4th St",
    "city": "New York",
    "state": "NY",
    "zip": "10014"
  },
  "location": [
    40.7543385,
    -73.976313
  ],
  "tags": [
    "mexican",
    "tacos",
    "burritos"
  ],
  "rating": "4.3"
}

Also content in synonyms.txt:

ny, new york, big apple

When I tried searching for anything in address.state property, I get empty result.

Here's the query:

{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "rating": {
            "gte": 4
          }
        }
      },
      "must": {
        "match": {
          "address.state": "ny"
        }
      }
    }
  }
}

Even with ny (as it is:no synonym) in query, the result is empty.

Before, when I created index without mappings, the query used to give the result, only except for synonyms.

But now with mappings, the result is empty even though the term is present.

This query is working though: { "query": { "query_string": { "query": "tacos", "fields": [ "tags" ] } } }

I looked and researched into many articles/tutorials and came up this far.

What am I missing here now?

Solution

While indexing you are passing the value as "state":"NY". Notice the case of NY. The analyzer synonym define in the settings has only one filter i.e. synonym. NY doesn't match any set of synonyms in defined in synonym.txt due to case. NOTE that NY isn't equal to ny. To overcome this problem (or we can call making it case insensitive) add lowercase filter before synonym filter to synonym analyzer. This will ensure that any input text is lower cased first and then synonym filter is applied. Same will happen when you search on that field using full text search queries.

So you settings will be as below:

  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": [
              "lowercase",
              "synonym"
            ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
          }
        }
      }
    }
  }

No changes are required in mapping.

Why it initially worked?

Answer to this is because when you haven't defined any mapping, elastic would map address.state as a text field with no explicit analyzer defined for the field. In such case elasticsearch by default uses standard analyzer which uses lowercase token filter as one of the filters. and hence the query matched the document.