Search code examples
elasticsearchsearch-suggestion

Creating multi-word search suggestions


Can Elasticsearch's edgen_n_grams be set up in a way that will build multi-word phrases as ES indexes crawled data?

I'd like to use those multi-word phrases as search suggestions for a small search app that I'm building.

I'm using Nutch to crawl some sites and using ES to index the crawled data.

I figured that since ES can split on split on whitespace - that this shouldn't be that hard... however, I'm not getting the results I expected. So now I'm asking if this is even possible to do?

My ES index is setup like this

    PUT /_template/autocomplete_1
{
  "template": "auto*",
  "settings": {
   "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1   
    },
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": "1",
          "max_gram": "30",
          "token_chars": ["letter","digit","whitespace"]
        }
      },
      "analyzer": {
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
            ]     
          }
        }
      }
    },
    "mappings": {
      "doc": {
        "_all": {
          "enabled": false
      },
      "properties": {
        "anchor": {
          "type": "string"
        },
       "boost": {
          "type": "string"
       },
       "content": {
          "type": "string",
          "index_analyzer": "autocomplete_analyzer",
          "search_analyzer": "standard"
       },...

"content" is the html body field per Nutch. I'm using 'content' as I figured it would generate the most phrases.


Solution

  • For creating multi-word phrases you need shingles. More specifically, this token filter that can combine tokens.