I have been using elasticsearch-analysis-kuromoji to perform searches in Japanese, but I have been getting two very strange behaviours, the first one being that the characters I search for will not work, like - '輸出貿易' will not work unless I pass it as '輸 出 貿 易' with spaces between each character. Also characters like ント are not searched on.
This is my configuration:
.field("type", "kuromoji_tokenizer")
.field("mode", "extended")
.field("discard_punctuation", "false")
.field("type", "custom")
.field("tokenizer", "kuromoji_user_dict")
Am I configuring it wrong or do I need a different tokeniser for character like: '輸出貿易 and ント'
Thank You
After some online research and some help from the elasticsearch-analysis-kuromoji team I was able to find the problem, even though I created the analyst and told the query to use it, I also need to add the mapping like so:
XContentBuilder xbMapping =
.field("type", "string")
.field("type", "string")