elasticsearch lucene tokenize stringtokenizer analyzer

elasticsearch custom tokenizer - split token by length

I am using elasticsearch version 1.2.1. I have a use case in which I would like to create a custom tokenizer that will break the tokens by their length up to a certain minimum length. For example, assuming minimum length is 4, the token "abcdefghij" will be split into: "abcd efgh ij".

I am wondering if I can implement this logic without the need of coding a custom Lucene Tokenizer class?

Thanks in advance.

Solution

For your requirement, if you can't do it using the pattern tokenizer then you'll need to code up a custom Lucene Tokenizer class yourself. You can create a custom Elasticsearch plugin for it. You can refer to this for examples about how Elasticsearch plugins are created for custom analyzers.

Constrain a Specman list so it doesn't have identical values in consecutive elements
How to type cast a list of uint to a list of vr_ahb_data in Specman?
Static fields/methods in e
What is the difference between deep_copy and gen keeping in Specman?
Specman UVM: What is the difference between write_reg { .field == 2;}; and write_reg_fields?
Does Specman support optional parameters to a method?
Specman e: When colon equal sign ":=" should be used?
Specman e: Is there a way to know how many values there is in an enumerated type?
Specman e error: No match for file when using "for each line in file"
How to run e file one by one? Not in parallel test
Specman/e constraint (for each in) iteration
Specman e: How driver's items queue can be locked from a sequence?
Specman e: How to print variable's address?
Specman e: "keep type .. is a" fails to refine the type of a field
Specman soft select on variable, decimal vs. hexadecimal values
Specman e: How to print a pointer to a struct?
Specman e: Is there a way to print unit instance name?
Specman e: simultaneous events error
Specman e coverage: ignored values appear in the coverage statistics
Specman e: How a sequence should be started when gen_and_start_main constrained to FALSE?
Specman/e list of lists (multidimensional array)
Does Specman e have struct constructor?
Specman e: How the predefined sequence.item should be used?
Specman e: define-as-computed macro error
Specman e UVM: Why to inherit from uvm_* units?
Specman e: A sequence drives its BFM also its MAIN was not defined in a test
e HVL (IEEE 1647): expect expression fails unexpectedly
Specman e subtyping: How to refer to FALSE value of conditional field in when/extend subtyping?
e HVL (IEEE 1647): How to set 'X' value?
Difference between declaring an event that is sensitive to a simple_port value and event_port