how to tokenize a filed in elk?

I want to tokenize a field(text) in all documents(60k) of index(post) what is the best approach?

GET /_analyze
{
"analyzer" : "standard",
"text" : ["this is a test"]
}

need tokenized text for tag cloud in my Django app

Solution

All strings data indexed as both text and keyword with standard analyzer by default. To explicitly create the mapping of the index you can use the following API call.

PUT my_index
{
  "mappings": {
    "properties": {
      "my_field_1": {
        "type": "text",
        "analyzer": "standard"
      },
      "my_field_2": {
        "type": "text",
        "analyzer": "standard"
      }
    }
  }
}

In that case, all data indexed into my_field_1 and my_field_2 will be eligible for full-text search.

If you already have an index you can use the following approaches

Use copy_to feature and copy all the field values inside of a field to make all them searchable inside of one field.
Create an ingest pipeline and trigger the update by query API call. I'm sharing an example below.

PUT my_index2/_doc/1
{
  "my_field_1": "musab dogan",
  "my_field_2": "elasticsearch opensearch"
}

PUT _ingest/pipeline/all_into_one
{
  "description": "Copy selected fields to a single new field",
  "processors": [
    {
      "script": {
        "source": """
          def newField = [];
          for (entry in ctx.entrySet()) {
            // Exclude fields starting with underscore
            if (!entry.getKey().startsWith("_")) {
              newField.add(entry.getKey() + ": " + entry.getValue());
            }
          }
          ctx['new_field'] = newField;
        """
      }
    }
  ]
}

POST my_index2/_update_by_query?pipeline=all_into_one

GET my_index2/_search
{
  "query": {
    "match": {
      "new_field": "musab"
    }
  }
}

After you run _update_by_query API call all existing data become updated. For the new incoming data you can add the ingest pipeline as default_pipeline.

PUT my_index/_settings
{
  "index.default_pipeline": "all_into_one"
}