I am using elasticsearch version 1.2.1. I have a use case in which I would like to create a custom tokenizer that will break the tokens by their length up to a certain minimum length. For example, assuming minimum length is 4, the token "abcdefghij" will be split into: "abcd efgh ij".
I am wondering if I can implement this logic without the need of coding a custom Lucene Tokenizer class?
Thanks in advance.
For your requirement, if you can't do it using the pattern tokenizer then you'll need to code up a custom Lucene Tokenizer class yourself. You can create a custom Elasticsearch plugin for it. You can refer to this for examples about how Elasticsearch plugins are created for custom analyzers.