Let's say that I use this mapping:
PUT test
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
},
"mappings": {
"testtype": {
"properties": {
"content": {
"type": "text",
"analyzer": "english",
"store": true
}
}
}
}
}
Now I can index a document:
PUT test/testtype/0
{
"content": "The Quick Brown Box"
}
And I can retrieve it:
GET test/testtype/0
Which will return me:
{
"_index": "test",
"_type": "testtype",
"_id": "0",
"_version": 1,
"found": true,
"_source": {
"content": "The Quick brown Fox"
}
}
I know that in the source field you are supposed to only have the document that you inserted this is why I specified in my mapping that I want to store my content field. So by querying my store field I would expect to have in it what is generated my the analyser so something like this:
"quick brown fox"
But when I query the stored field:
GET test/testtype/_search
{
"stored_fields": "content"
}
I have exactly what I wrote in my document:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "testtype",
"_id": "0",
"_score": 1,
"fields": {
"content": [
"The Quick brown Fox"
]
}
}
]
}
}
So my question is how can I store in my elasticsearch the result of what is generated by my analyser?
You question is about the difference between the stored text and the generated tokens: the store attribute of a lucene field
A stored field contains exactly the same as the corresponding field in the "_source"-JSON.
The generated token are in a lucene internal representation. But you can use the _analyze
or _termvectors
endpoint to have see the token
or you can use the term-aggregation