I'm trying to implement simple multi-token synonyms in Elasticsearch, but not getting the results I expect. Here's some curl:
curl -XPOST "http://localhost:9200/test" -d'
{
"mappings": {
"my_type": {
"properties": {
"blah": {
"type": "string",
"analyzer": "my_synonyms"
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"type": "synonym",
"synonyms": [
"foo bar, fooo bar"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}'
Index a few documents:
curl -XPUT localhost:9200/test/my_type/1 -d '{"blah": "fooo bar"}'
curl -XPUT localhost:9200/test/my_type/2 -d '{"blah": "fooo barr"}'
curl -XPUT localhost:9200/test/my_type/3 -d '{"blah": "foo bar"}'
Now query:
curl -XPOST "http://localhost:9200/test/_search" -d'
{
"query": {
"match": {
"blah": "foo bar"
}
}
}'
I'm expecting to get back documents 1 and 3, however, only get back 3. Does anyone know what the problem could be?
Upon further inspection I'm also not getting the expected tokens when calling the analyzer directly:
curl 'localhost:9200/test/_analyze?analyzer=my_synonyms' -d 'fooo bar'
Returns only one token, "fooo bar", when I'm expecting two tokens: "fooo bar" and "foo bar".
It looks like if you did a search for 'fooo bar' instead, you will get documents 1 and 3. To get the results you were expecting, you will have to flip your synonym terms to go the other way:
"fooo bar => foo bar"
The arrow tells ES to add terms on the right side as synonyms for all terms on the left. If you want them to go bi-directional, you can simply do 'fooo bar, foo bar' and make sure expand is not explicitly set to false.