I am using elastic search for full text search in a Django application. I am using the elastic_dsl library from pypi to interface with the cluster. I am trying to implement a shingle filter in the analyzer. I believe I have gotten it to work with default values:
from elasticsearch_dsl import analyzer, tokenizer
main_analyzer = analyzer(
'main_analyzer',
tokenizer="standard",
filter=[
"lowercase",
"stop",
"porter_stem",
"shingle"
]
)
I would like to change the defaults. Eg, set max_shingle_size to 5 instead of the default 2. I cannot find the syntax for doing this. I have read the documentation, the examples in the Git repository, and some of the source code.
You need to define a custom token filter and use it in your custom analyzer:
from elasticsearch_dsl import analysis
main_analyzer = analysis.analyzer(
"main_analyzer",
tokenizer="standard",
filter=[
"lowercase",
"stop",
"porter_stem",
analysis.token_filter("my_shingle", "shingle", max_shingle_size=5)
]
)