I need to tokenize string 36-3031.00|36-3021.00
to 36-3031.00
and 36-3021.00
using |
delimiter.
I have tried like this,
PUT text
{
"test1": {
"settings": {
"analysis" : {
"tokenizer" : {
"pipe_tokenizer" : {
"type" : "pattern",
"pattern" : "|"
}
},
"analyzer" : {
"pipe_analyzer" : {
"type" : "custom",
"tokenizer" : "pipe_tokenizer"
}
}
}
},
"mappings": {
"mytype": {
"properties": {
"text": {
"type": "string",
"analyzer": "pipe_analyzer"
}
}
}
}
}}
But it does't produce exact. Can anyone sort out this use case ?
The following is the correct mapping you should use (including the index name in the REST PUT command). And the |
character needs to be escaped:
DELETE test1
PUT test1
{
"settings": {
"analysis": {
"tokenizer": {
"pipe_tokenizer": {
"type": "pattern",
"pattern": "\\|"
}
},
"analyzer": {
"pipe_analyzer": {
"type": "custom",
"tokenizer": "pipe_tokenizer"
}
}
}
},
"mappings": {
"mytype": {
"properties": {
"text": {
"type": "string",
"analyzer": "pipe_analyzer"
}
}
}
}
}
POST /test1/mytype/1
{"text":"36-3031.00|36-3021.00"}
GET /test1/_analyze
{"field":"text","text":"36-3031.00|36-3021.00"}