Search code examples
pythondjangoelasticsearchelasticsearch-dsl-py

How to include shingle elasticsearch filter in analyzer with Python elasticsearch_dsl


I am using elastic search for full text search in a Django application. I am using the elastic_dsl library from pypi to interface with the cluster. I am trying to implement a shingle filter in the analyzer. I believe I have gotten it to work with default values:

from elasticsearch_dsl import analyzer, tokenizer


main_analyzer = analyzer(
    'main_analyzer',
    tokenizer="standard",
    filter=[
        "lowercase",
        "stop",
        "porter_stem",
        "shingle"
        ]
    )

I would like to change the defaults. Eg, set max_shingle_size to 5 instead of the default 2. I cannot find the syntax for doing this. I have read the documentation, the examples in the Git repository, and some of the source code.


Solution

  • You need to define a custom token filter and use it in your custom analyzer:

    from elasticsearch_dsl import analysis
    
    main_analyzer = analysis.analyzer(
        "main_analyzer",
        tokenizer="standard",
        filter=[
            "lowercase",
            "stop",
            "porter_stem",
            analysis.token_filter("my_shingle", "shingle", max_shingle_size=5)
        ]
    )