Search code examples
pythondjangoelasticsearchelasticsearch-dsl

How can I get all results containing elasticsearch-dsl query keyword?


When I query my PostDocument it returns the results that only contain full words from the query. For example if there were 4 posts:

1. "Post 1"
2. "Post 2"
3. "Posts 3"
4. "Po 4"

and I query it with: posts = PostDocument.search().query('match', body="Post") it will return items 1 and 2, if body="Po" it will return only item 4. How can I write the query so it returns all the results that contain the keyword? For example if I did this body="Po" I would get all 4 items.


Solution

  • You can use the edge_ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.

    Adding a working example with index data, mapping, search query, and search result

    Index Mapping:

    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "edge_ngram",
              "min_gram": 2,
              "max_gram": 10,
              "token_chars": [
                "letter",
                "digit"
              ]
            }
          }
        },
        "max_ngram_diff": 50
      },
      "mappings": {
        "properties": {
          "body": {
            "type": "text",
            "analyzer": "my_analyzer"
          }
        }
      }
    }
    

    Index Data:

    {
      "body": "Post 1"
    }
    {
      "body": "Post 2"
    }
    {
      "body": "Posts 3"
    }
    {
      "body": "Po 4"
    }
    

    Search Query:

    {
        "query": {
            "match": {
                "body": "Po"
            }
        }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "64684245",
            "_type": "_doc",
            "_id": "4",
            "_score": 0.1424427,
            "_source": {
              "body": "Po 4"
            }
          },
          {
            "_index": "64684245",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.10158265,
            "_source": {
              "body": "Post 1"
            }
          },
          {
            "_index": "64684245",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.10158265,
            "_source": {
              "body": "Post 2"
            }
          },
          {
            "_index": "64684245",
            "_type": "_doc",
            "_id": "3",
            "_score": 0.088840574,
            "_source": {
              "body": "Posts 3"
            }
          }
        ]