I'm building blog-like app with flask (based on Miguel Grinberg Megatutorial) and I'm trying to setup ES indexing that would support autocomplete feature. I'm struggling with setting up indexing correctly.
I started with (working) simple indexing mechanism:
from flask import current_app
def add_to_index(index, model):
if not current_app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
and after some fun with Google I found out that my body could look something like that (probably with fewer analyzers, but I'm coping exactly as I found it somewhere, where author claims it works):
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
field: {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
}
I figured out that I can modify original mechanism to something like:
for field in model.__searchable__:
temp = getattr(model, field)
fields[field] = {"properties": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}}
payload = {
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": fields
}
but that's where I'm lost. Where do I put actual content (temp=getattr(model, field)) in this document so that whole thing works? I couldn't find any example or relevant part of documentation that would cover updating index with slightly more complex mappings and so on, is this even correct/doable? Every guide I see covers bulk indexing and somehow I fail to make connection.
I think you are a little bite confuse let me try to explain. What you want is adding one document in elastic with:
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
Which is using the index() method defined in the elasticsearch-py lib Check the example here: https://elasticsearch-py.readthedocs.io/en/master/index.html#example-usage body must be your document a simple dict, as shown in the example from the doc.
What you set is the settings of the index which is different. Take the analogy of the database, you set the schema of a table inside the document.
To set the settings if you want to set the given settings you need to use put_settings, as defined here: https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=settings#elasticsearch.client.ClusterClient.put_settings
I hope it help you.