Consider a query such as this one:
{
"size": 200,
"query": {
"bool" : {
....
}
},
"sort": {
"_script" : {
"script" : {
"source" : "params._source.participants[0].participantEmail",
"lang" : "painless"
},
"type" : "string",
"order" : "desc"
}
}
}
This query works for almost every document, for some of them are not at their correct place. How could it be?
The order of the last documents is like that(I'm displaying the first item of the participant array of each doc):
shiend@....
denys@...
Lynn@...
How is it possible? I don't have direction. Is the sort query wrong?
Settings:
"myindex" : {
"settings" : {
"index" : {
"refresh_interval" : "30s",
"number_of_shards" : "5",
"provided_name" : "myindex",
"creation_date" : "1600703588497",
"analysis" : {
"filter" : {
"english_keywords" : {
"keywords" : [
"example"
],
"type" : "keyword_marker"
},
"english_stemmer" : {
"type" : "stemmer",
"language" : "english"
},
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/UK_US_Sync_2.csv",
"updateable" : "true"
},
"english_possessive_stemmer" : {
"type" : "stemmer",
"language" : "possessive_english"
},
"english_stop" : {
"type" : "stop",
"stopwords" : "_english_"
},
"my_katakana_stemmer" : {
"type" : "kuromoji_stemmer",
"minimum_length" : "4"
}
},
"normalizer" : {
"custom_normalizer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom",
"char_filter" : [ ]
}
},
"analyzer" : {
"somevar_english" : {
"filter" : [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer",
"asciifolding",
"synonym"
],
"tokenizer" : "standard"
},
"myvar_chinese" : {
"filter" : [
"porter_stem"
],
"tokenizer" : "smartcn_tokenizer"
},
"myvar" : {
"filter" : [
"my_katakana_stemmer"
],
"tokenizer" : "kuromoji_tokenizer"
}
}
},
"number_of_replicas" : "1",
"uuid" : "d0LlBVqIQGSk4afEWFD",
"version" : {
"created" : "6081099",
"upgraded" : "6081299"
}
}
}
}
Mapping:
{
"myindex": {
"mappings": {
"doc": {
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"properties": {
"all_fields": {
"type": "text"
},
"participants": {
"type": "nested",
"include_in_root": true,
"properties": {
"participantEmail": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "custom_normalizer"
}
},
"copy_to": [
"all_fields"
]
},
"participantType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "custom_normalizer"
}
},
"copy_to": [
"all_fields"
]
}
}
}
}
}
}
}
}
EDIT: Maybe it's because the email Lynn@.. starts with an uppercase?
Indeed, string are sorted in lexical order, i.e. uppercase letters come prior to lowercase ones (the other way around for descending order)
What you can do is to lowercase all emails in your script:
"sort": {
"_script" : {
"script" : {
"source" : "params._source.participants[0].participantEmail.toLowerCase()",
"lang" : "painless"
},
"type" : "string",
"order" : "desc"
}
}