Very simple question. I have an ElasticSearch index with a text field. How do I get the list of all the words indexed for that field? Is there any simple method?
I'm working in python with elasticsearch
library.
fetching all indexed words of an index is expensive in terms of time and resources, especially if the number of unique terms is large. So please, be careful about it while using on production cluster.
To be able to do so, the Elasticsearch first needs to load all that words into memory, which is disabled by default for text fields (see FieldData mapping parameter for more info).
Assuming that the field data is enable on your index, you can get the unique terms list, sorted by their frequence using below serach query:
{
"size": 0,
"aggs": {
"indexed_terms": {
"terms": {
"field": "field_name",
"size": 10000 (1)
}
}
}
}
unless enabling the fieldData, you will encounter such a below error:
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on your fields in order to load field data by uninverting the inverted index. Note that this can use significant memory.
If you only need to fetch such a list of indexed terms for a single document, you can simply use the _termsvector API, while you don't need to enable field data anymore.