Search code examples
node.jselasticsearchfrontendn-gram

How to build an N-Gram relationship in Elasticsearch


I am new to Elasticsearch, and I am looking to build a Front-End app which has a list of proverbs. As the user browses these proverbs, I want them to find related N-Gram proverbs, or analytic proverbs from the Proverb DB. For example when clicking on

"A watched pot never boils" would bring the following suggestions:

  • 1-Gram suggestion: "Two pees in a pot"

  • 2-Gram suggestion: "A Watched pot tastes bitter"

  • Analytical suggestion: "Too many cooks spoil the broth"

Is there a way to do that in ES, or do I need to build my own logic ?


Solution

  • The 1-gram suggestion works out of the box and the 2-gram suggestions can easily be achieved with shingle.

    Here is an attempt

    PUT test
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "2-grams": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "shingles"
              ]
            }
          },
          "filter": {
            "shingles": {
              "type": "shingle",
              "min_shingle_size": 2,
              "max_shingle_size": 2,
              "output_unigrams": false
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "text": {
            "type": "text",
            "analyzer": "standard",
            "fields": {
              "2gram": {
                "type": "text",
                "analyzer": "2-grams"
              }
            }
          }
        }
      }
    }
    

    Next index some documents:

    PUT test/_doc/1
    {
      "text": "Two pees in a pot"
    }
    
    PUT test/_doc/2
    {
      "text": "A Watched pot tastes bitter"
    }
    

    Finally, you can search for 1-gram suggestions using the following query and you'll get both documents in the response:

    POST test/_search
    {
      "query": {
        "match": {
          "text": "A watched pot never boils"
        }
      }
    }
    

    You can also search for 2-gram suggestions using the following query and only the second document will come up:

    POST test/_search
    {
      "query": {
        "match": {
          "text.2gram": "A watched pot never boils"
        }
      }
    }
    

    PS: Not sure how the "analytical" suggestion works, though, feel free to provide more insights, and I'll update.