I want to tokenize a field(text) in all documents(60k) of index(post) what is the best approach?
GET /_analyze
{
"analyzer" : "standard",
"text" : ["this is a test"]
}
need tokenized text for tag cloud in my Django app
All strings data indexed as both text
and keyword
with standard analyzer by default. To explicitly create the mapping of the index you can use the following API call.
PUT my_index
{
"mappings": {
"properties": {
"my_field_1": {
"type": "text",
"analyzer": "standard"
},
"my_field_2": {
"type": "text",
"analyzer": "standard"
}
}
}
}
In that case, all data indexed into my_field_1 and my_field_2
will be eligible for full-text search.
update by query
API call. I'm sharing an example below.PUT my_index2/_doc/1
{
"my_field_1": "musab dogan",
"my_field_2": "elasticsearch opensearch"
}
PUT _ingest/pipeline/all_into_one
{
"description": "Copy selected fields to a single new field",
"processors": [
{
"script": {
"source": """
def newField = [];
for (entry in ctx.entrySet()) {
// Exclude fields starting with underscore
if (!entry.getKey().startsWith("_")) {
newField.add(entry.getKey() + ": " + entry.getValue());
}
}
ctx['new_field'] = newField;
"""
}
}
]
}
POST my_index2/_update_by_query?pipeline=all_into_one
GET my_index2/_search
{
"query": {
"match": {
"new_field": "musab"
}
}
}
After you run _update_by_query
API call all existing data become updated. For the new incoming data you can add the ingest pipeline as default_pipeline.
PUT my_index/_settings
{
"index.default_pipeline": "all_into_one"
}