Search code examples
elasticsearchsynonym

elasticsearch multi-token keyword synonyms


I'm trying to implement simple multi-token synonyms in Elasticsearch, but not getting the results I expect. Here's some curl:

curl -XPOST "http://localhost:9200/test" -d'
{
  "mappings": {
    "my_type": {
      "properties": {
        "blah": {
          "type": "string",
          "analyzer": "my_synonyms"
        }
      }
    }
  },
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_syn_filt": {
            "type": "synonym",
            "synonyms": [
              "foo bar, fooo bar"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_syn_filt"
            ],
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}'

Index a few documents:

curl -XPUT localhost:9200/test/my_type/1 -d '{"blah": "fooo bar"}'
curl -XPUT localhost:9200/test/my_type/2 -d '{"blah": "fooo barr"}'
curl -XPUT localhost:9200/test/my_type/3 -d '{"blah": "foo bar"}'

Now query:

curl -XPOST "http://localhost:9200/test/_search" -d'
{
  "query": {
    "match": {
      "blah": "foo bar"
    }
  }
}'

I'm expecting to get back documents 1 and 3, however, only get back 3. Does anyone know what the problem could be?

Upon further inspection I'm also not getting the expected tokens when calling the analyzer directly:

curl 'localhost:9200/test/_analyze?analyzer=my_synonyms' -d 'fooo bar'

Returns only one token, "fooo bar", when I'm expecting two tokens: "fooo bar" and "foo bar".


Solution

  • It looks like if you did a search for 'fooo bar' instead, you will get documents 1 and 3. To get the results you were expecting, you will have to flip your synonym terms to go the other way:

    "fooo bar => foo bar"

    The arrow tells ES to add terms on the right side as synonyms for all terms on the left. If you want them to go bi-directional, you can simply do 'fooo bar, foo bar' and make sure expand is not explicitly set to false.