Search code examples
elasticsearchsynonymelasticsearch-queryelasticsearch-analyzers

No match on document if the search string is longer than the search field


I have a title I am looking for

The title is, and is stored in a document as "Police diaries : stefan zweig"

When I search "Police" I get the result. But when I search Policeman I do not get the result.

Here is the query:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "title",
              omitted because irrelevance...
            ],
            "query": "Policeman",
            "fuzziness": "1.5",
            "prefix_length": "2"
          }
        }
      ],
      "must": {
        omitted because irrelevance...
      }
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

and here is the mapping

{
    "books": {
        "mappings": {
            "book": {
                "_all": {
                    "analyzer": "nGram_analyzer", 
                    "search_analyzer": "whitespace_analyzer"
                },
                "properties": {
                    "title": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "keyword"
                            },
                            "sort": {
                                "type": "text",
                                "analyzer": "to order in another language, (creates a string with symbols)",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }
}

It should be noted that I have documents with a title "some title" which get hits if I search for "someone title".

I cant figure out why the police book is not showing up.


Solution

  • So you have 2 parts of your question.

    1. You want to search the title containing police when searching for policeman.
    2. want to know why some title documents match the someone title document and according to that you expect the first one to match as well.

    Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.

    Your document containing some title creates below tokens and you can verify this with analyzer API.

    POST /_analyze
    
    {
        "text": "some title",
        "analyzer" : "standard" --> default analyzer for text field
    }
    

    Generated tokens

    {
        "tokens": [
            {
                "token": "some",
                "start_offset": 0,
                "end_offset": 4,
                "type": "<ALPHANUM>",
                "position": 0
            },
            {
                "token": "title",
                "start_offset": 5,
                "end_offset": 10,
                "type": "<ALPHANUM>",
                "position": 1
            }
        ]
    }
    

    Now when you search for someone title using the match query which is analyzed and uses the same analyzer which is used on index time on field.

    So it creates 2 tokens someone and title and match query matches the title tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.

    How to bring police title when searching for policeman

    You need to make use of synonyms token filter as shown in the below example.

    Index Def

    {
        "settings": {
            "analysis": {
                "analyzer": {
                    "synonyms": {
                        "filter": [
                            "lowercase",
                            "synonym_filter"
                        ],
                        "tokenizer": "standard"
                    }
                },
                "filter": {
                    "synonym_filter": {
                        "type": "synonym",
                        "synonyms" : ["policeman => police"] --> note this
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "": {
                    "type": "text",
                    "analyzer": "synonyms"
                }
            }
        }
    }
    

    Index sample doc

    {
      "dialog" : "police"
    }
    

    Search query having term policeman

    {
        "query": {
            "match" : {
                "dialog" : {
                    "query" : "policeman"
                }
            }
        }
    }
    

    And search result

     "hits": [
          {
            "_index": "so_syn",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
              "dialog": "police" --> note source has `police` only.
            }
          }
        ]