I am using elasticsearch version 7.x and created an account index with following mapping.
curl --location --request PUT 'http://localhost:9200/accounts' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"properties": {
"type": {"type": "keyword"},
"id": {"type": "keyword"},
"label": {"type": "keyword"},
"lifestate": {"type": "keyword"},
"name": {"type": "keyword"},
"users": {"type": "text"}
}
}
}'
and I'm storing users as an array. In my use case, an account can have n-number of users. So I'm storing it in the following format.
curl --location --request PUT 'http://localhost:9200/accounts/_doc/account3' \
--header 'Content-Type: application/json' \
--data-raw '{
"id" : "account_uuid",
"name" : "Account_Description",
"users" : [
"id:6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
"id:9611e2be-784f-4a07-b5de-564b3820a660~~status:INACTIVE"
]
}'
And to search based on the user ids and its status, I've created a pattern analyzer which splits by ~~ symbol like the following.
curl --location --request PUT 'http://localhost:9200/accounts/_settings' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings": {
"analysis": {
"analyzer": {
"p_analyzer": {
"type": "pattern",
"pattern" :"~~"
}
}
}
}
}'
And search query call is
curl --location --request GET 'http://localhost:9200/accounts/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"filter": [
{ "term": {"id": "account_uuid"} },
{ "match" : {"users" : {
"query" : "id:<user_id>",
"analyzer" : "p_analyzer"
}}}
]
}
}
}'
This does works, if the userid format is plain string. That is, if the user id stored in the non UUID format, it works well. But it does not work with id in UUID format . How to make this working ?
Modify your analyzer to include the -
hypen which should solve your issue as it creates token for UUID.
{
"settings": {
"analysis": {
"analyzer": {
"p_analyzer": {
"type": "pattern",
"pattern": "~~|-", --> note hypen is included `-`
"lowercase": true
}
}
}
}
}
Using the above analyzer below tokens are generated
POST /your-index/_analyze
{
"text" : "6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
"analyzer" : "my_email_analyzer"
}
Generated tokens
{
"tokens": [
{
"token": "6de57db5",
"start_offset": 0,
"end_offset": 8,
"type": "word",
"position": 0
},
{
"token": "8fdb",
"start_offset": 9,
"end_offset": 13,
"type": "word",
"position": 1
},
{
"token": "4a39",
"start_offset": 14,
"end_offset": 18,
"type": "word",
"position": 2
},
{
"token": "ab46",
"start_offset": 19,
"end_offset": 23,
"type": "word",
"position": 3
},
{
"token": "21af623692ea",
"start_offset": 24,
"end_offset": 36,
"type": "word",
"position": 4
},
{
"token": "status:active",
"start_offset": 38,
"end_offset": 51,
"type": "word",
"position": 5
}
]
}
Now searching for 6de57db5-8fdb-4a39-ab46-21af623692ea
would break it into 6de57db5
, 8fdb
, 4a39
, so on and would match the token generated at index time and would come in the search result.