I used n-grams tokenizer to create the n-grams in elasticsearch but I can't retrieve the frequency of each gram either bi-gram or tri-gram. how can I do that?
It wasn't clear from your question exactly what you were trying to do. It's generally a good idea to post the code you've tried, and as specific a description of your problem as possible.
At any rate, I think this code will come close to doing what you want:
http://sense.qbox.io/gist/f357f15360719299ac556e8082afe26e4e0647d1
I started with the code in this answer, then refined some using the information in the docs for shingle token filters. Here is the mapping I ended up with:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"evolutionAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"custom_shingle"
]
}
},
"filter": {
"custom_shingle": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "3",
"filler_token": "",
"output_unigrams": true
}
}
}
},
"mappings": {
"doc": {
"properties": {
"content": {
"type": "string",
"index_analyzer": "evolutionAnalyzer",
"search_analyzer": "standard",
"term_vector": "yes"
}
}
}
}
}
Again, be careful using term vectors in production.