Search code examples
pythondjangoelasticsearchelasticsearch-dsl

ElasticSearch Suggester full-text-search


I'm using django_elasticsearch_dsl.

My Document:

html_strip = analyzer(
    'html_strip',
    tokenizer='standard',
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.KeywordField(),
            'suggest': fields.CompletionField(),
        }
    )
    ...

My request:

_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()

I have the following document "names" indexed:

"This is a test"
"this is my test"
"this test"
"Test this"

Now if search for This is my text if will receive only

"this is my text"

However, if I search for test, then all I get is

"Test this"

Even though I want all documents, that have test in their name.

What am I missing?


Solution

  • Based on the comment given by the user, adding another answer using ngrams

    Adding a working example with index mapping, index data, search query, and search result

    Index Mapping:

    {
      "settings": {
        "analysis": {
          "filter": {
            "ngram_filter": {
              "type": "ngram",
              "min_gram": 4,
              "max_gram": 20
            }
          },
          "analyzer": {
            "ngram_analyzer": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "ngram_filter"
              ]
            }
          }
        },
        "max_ngram_diff": 50
      },
      "mappings": {
        "properties": {
          "name": {
            "type": "text",
            "analyzer": "ngram_analyzer",
            "search_analyzer": "standard"
          }
        }
      }
    }
    

    Index Data:

    {
      "name": [
        "Test this"
      ]
    }
    
    {
      "name": [
        "This is a test"
      ]
    }
    
    {
      "name": [
        "this is my test"
      ]
    }
    
    {
      "name": [
        "this test"
      ]
    }
    

    Analyze API:

    POST/_analyze
    
    {
      "analyzer" : "ngram_analyzer",
      "text" : "this is my test"
    }
    

    The following tokens are generated:

    {
      "tokens": [
        {
          "token": "this",
          "start_offset": 0,
          "end_offset": 4,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "test",
          "start_offset": 11,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 3
        }
      ]
    }
    

    Search Query:

    {
        "query": {
            "match": {
               "name": "test"
            }
        }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 0.2876821,
            "_source": {
              "name": [
                "Test this"
              ]
            }
          },
          {
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "3",
            "_score": 0.2876821,
            "_source": {
              "name": [
                "this is my test"
              ]
            }
          },
          {
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "name": [
                "This is a test"
              ]
            }
          },
          {
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
              "name": [
                "this test"
              ]
            }
          }
        ]
    

    For fuzzy search you can use the below search query:

    {
      "query": {
        "fuzzy": {
          "name": {
            "value": "tst"    <-- used tst in place of test
          }
        }
      }
    }