I want to tokenize "a.b.c
" into a, a.b, a.b.c, b.c, b, c
parts in ElasticSearch. I tried some regex but updating tokenizer is tedious and I'm very bad at regex so I'm asking for help.
I already tried this formulas but they didn't gave me what I want:
[(^\\.)]+
[(.+\\.)]+
[^\\p{L}\\d]+
Try this,
PUT my_sample
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy",
"delimiter": ".",
"replacement": "."
}
}
}
}
}
then,
POST my_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "a.b.c"
}
it will produces the following terms:
[ a.b.c., a.b., b.c., a., b., c. ]
then you simple handle it through your program