I have alphanumeric codes like Hcc18, HCC23, I23, which I want to store in ElasticSearch. Over this I want to build following two features:-
My Elasticsearch current mapping is:
"mappings": {
"properties": {
"code": {
"type": "text",
"analyzer": "autoanalyer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"autoanalyer": {
"tokenizer": "standard",
"filter": [
"lowercase",
]
}
},
"tokenizer": {
"autotoken": {
"type": "simple_pattern",
"pattern": "[0-9]+"
}
}
}
}
Query being made:
{
"min_score": 0.1,
"from": 0,
"size": 10000,
"query": {
"bool": {
"should": [{ "match": {"code": search_term}}]
}
}
}
Two problems, I am facing with this approach is:-
Let's say I search for I420, now because mapping is based only on digits, I am getting all the codes related to number 420, but the exact match I420 isn't coming on the top.
Will this mapping how will I be able to achieve the above mentioned Autocomplete feature.
You had multiple requirements and all these can be achieved using
Below is the step by step example, using the OP data and queries.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "autotoken" -->used your analyzer to extract numbers
}
},
"tokenizer": {
"autotoken": {
"type": "simple_pattern",
"pattern": "[0-9]+",
"preserve_original": true
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "keyword",
"fields": {
"number": {
"type": "text",
"analyzer" : "my_analyzer"
}
}
}
}
}
}
{
"code" : "hcc420"
}
{
"code" : "HCC23"
}
{
"code" : "I23"
}
{
"code" : "I420"
}
{
"code" : "I421"
}
{
"code" : "hcc420"
}
I420
, should bring 2 docs in sample data I420
and hcc420
but I420
must have more score as exact match){
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "I420"
}
}
},
{
"match": {
"code.number": "I420"
}
}
]
}
}
}
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 2.0296195, --> note exact match having high score
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "7",
"_score": 1.0296195,
"_source": {
"code": "hcc420"
}
}
]
So searching for I42
must bring I420
and I421
from sample docs
{
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "I42"
}
}
},
{
"match": {
"code.number": "I42"
}
}
]
}
}
}
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "5",
"_score": 1.0,
"_source": {
"code": "I421"
}
}
]
Let's take another example for number search, searching for 420
must bring hcc420
and I420
{
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "420"
}
}
},
{
"match": {
"code.number": "420"
}
}
]
}
}
}
And whoa, again it gave expected results 😀
Result
------
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 1.0296195,
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "7",
"_score": 1.0296195,
"_source": {
"code": "hcc420"
}
}
]