I have the mapping below and it works normally
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "0",
"analysis": {
"filter": {
"stemmer_plural_portugues": {
"name": "minimal_portuguese",
"stopwords" : ["http", "https", "ftp", "www"],
"type": "stemmer"
},
"synonym_filter": {
"type": "synonym",
"lenient": true,
"synonyms_path": "analysis/synonym.txt",
"updateable" : true
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"analyzer_customizado": {
"filter": [
"lowercase",
"stemmer_plural_portugues",
"asciifolding",
"synonym_filter",
"shingle_filter"
],
"tokenizer": "lowercase"
}
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"data": {
"type": "date"
},
"quebrado": {
"type": "byte"
},
"pgrk": {
"type": "integer"
},
"url_length": {
"type": "integer"
},
"title": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"description": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"url": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
I insert the doc below
{
"title": "rocket 1960",
"description": "space",
"url": "www.nasa.com"
}
If I execute the query below using the AND operator, it will find the doc normally, because all the words searched exist in the doc.
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "space nasa rocket",
"type": "cross_fields",
"fields": [
"title",
"description",
"url"
],
"operator": "and"
}
}
}
but if I put it in the search also "1960" as the query below does not return anything
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "1960 space nasa rocket",
"type": "cross_fields",
"fields": [
"title",
"description",
"url"
],
"operator": "and"
}
}
}
I found that my "lowercase" tokenizer does not generate a numeric token. So I change my tokenizer to "standard" and the 1960 numeric token is generated.
but the query does not find anything, because the URL field that has the link www.nasa.com no longer generates the token "www nasa com" the generated token is the entire link www.nasa.com.
The query only works if I enter the full URL www.nasa.com as shown below
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "1960 space www.nasa.com rocket",
"type": "cross_fields",
"fields": [
"title",
"description",
"url"
],
"operator": "and"
}
}
}
If I generate another "lowercase" tokenizer just for the URL field the link www.nasa.com again generates the separate tokens "www nasa com"
but my query below does not find anything, because the URL field has a different tokenizer than the other fields title and description. The query below only works if I use the OR operator, but I need the AND operator,
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "1960 space nasa rocket",
"type": "cross_fields",
"fields": [
"title",
"description",
"url"
],
"operator": "and"
}
}
}
I cannot use Ngram in my mapping because I use "Phrase Suggester" and when I use Ngram the suggestions are being generated with hundreds of tokens generating inaccuracy in the suggestions.
would anyone know any solution for my mapping to be able to generate numeric tokens in my "title and descripton" fields, but that my URL field will continue with the website links being broken into several tokens "www nasa com" instead of the link being whole "www .nasa.com "and that my query works as an AND operator searching all fields at the same time.
If I put it in the search also "1960" as the query below does not return anything
In the following Index Mapping, I have removed synonym_filter
. After removing it and indexing the sample documents, and running the same search query as you mentioned in the question, I am able to get the desired result
Index Mapping :
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "0",
"analysis": {
"filter": {
"stemmer_plural_portugues": {
"name": "minimal_portuguese",
"stopwords": [
"http",
"https",
"ftp",
"www"
],
"type": "stemmer"
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"analyzer_customizado": {
"filter": [
"lowercase",
"stemmer_plural_portugues",
"asciifolding",
"shingle_filter"
],
"tokenizer": "lowercase"
}
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"data": {
"type": "date"
},
"quebrado": {
"type": "byte"
},
"pgrk": {
"type": "integer"
},
"url_length": {
"type": "integer"
},
"title": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"description": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"url": {
"analyzer": "analyzer_customizado",
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
Search Query:
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "1960 space nasa rocket",
"type": "cross_fields",
"fields": [
"title",
"description",
"url"
],
"operator": "and"
}
}
}
Search Result:
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 0.9370217,
"_source": {
"title": "rocket 1960",
"description": "space",
"url": "www.nasa.com"
}
}
]
As stated by @Gibbs, I think there is some issue in synonym_filter
, so it would be better if you share synonym.txt
otherwise, the search query is running perfectly.
Update 1 : (Including synonym_filter)
If you want to include Synonym Token Filter then, keep the index mapping same as yours, just making some changes in the mapping which is:
"synonym_filter": {
"type": "synonym",
"lenient": true,
"synonyms_path": "analysis/synonym.txt",
"updateable" : false --> set this to false
},
You set your synonym filter to "updateable", presumably because you want to change synonyms without having to close and reopen the index but instead use the reload API. Updatable synonyms restrict the analyzer they are used in to be only used at search time .
To get the full explanation of this, you can refer to this ES discussion
Use the same search query as above (after making changes in the mapping ), you will get your desired result.
But if you still want to set "updateable" : true
, then you can refer official documentation of Reload search analyzers API