Search code examples
elasticsearchelasticsearch-6

Re-using inbuilt language filters?


I saw the question here, which shows how one can create a custom analyzer to have both synonym support and support for languages.

However, it seems to create its own stemmer and stopwords collection as well.

What if I want to add synonyms to the "danish" inbuilt analyzer? Can I refer to the inbuilt Danish stemmer and stopwords filter? As an example, is it just called danish_stemmer and danish_stopwords?

Perhaps a list of inbuilt filters would help - where can I see the names of these inbuilt filters?


Solution

  • For each pre-built language analyzer there is an example of how to rebuild it. For danish there is this example:

    PUT /danish_example
    {
      "settings": {
        "analysis": {
          "filter": {
            "danish_stop": {
              "type":       "stop",
              "stopwords":  "_danish_" 
            },
            "danish_keywords": {
              "type":       "keyword_marker",
              "keywords":   ["eksempel"] 
            },
            "danish_stemmer": {
              "type":       "stemmer",
              "language":   "danish"
            }
          },
          "analyzer": {
            "rebuilt_danish": {
              "tokenizer":  "standard",
              "filter": [
                "lowercase",
                "danish_stop",
                "danish_keywords",
                "danish_stemmer"
              ]
            }
          }
        }
      }
    }
    

    This is essentially building your own custom analyzer.

    The list of available stemmers can be found here. The list of available pre-built stopwords lists can be found here.

    Hope that helps!