Search code examples
elasticsearchtokenize

In ElasticSearch how do I see tokens emitted by a custom analyzer?


I'm trying to write a custom analyzer with its own filter and char_filter. It would help me if I could figure out how to see the tokens emitted by the analyzer/filter/char_filter combo.

Is there an API query I can use to inspect the tokens emitted from a given string with a custom analyzer, filter, and char_filter?


Solution

  • You can use this query to check tokens emitted for any field for a given doc already saved in elastic

    curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
      "query": {
        "match_all": {},
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                {
                  "term": {
                    "_id": "1770"
                  }
                }
              ]
            }
          }
        }
      },
      "script_fields": {
        "terms": {
          "script": "doc[field].values",
          "params": {
            "field": "input"
          }
        }
      }
    }
    

    Also to find token emitted for a string by any custom analyzer on the fly you can use this.

    GET autosuggest_index_alllocations1/_analyze?analyzer=index_analyzerV2&text=healthy tiffins