I am trying to perform an Elasticsearch query, where I am supposed to get all restaurants which contain the substring 'pizz' in the restaurant name but do not contain neither 'pizza' nor 'pizzeria'.
The query I wrote for this purpose is this:
GET my_index/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"RestaurantName": {
"value": "*pizz*"
}
}
}
],
"must_not": [
{
"match": {
"RestaurantName": "pizza"
}
},
{
"match": {
"RestaurantName": "pizzeria"
}
}
]
}
}
}
This query matches fields like Instapizza
which is wrong. It should match anything combined or uppercase cases like: Fozzie's Pizzaiolo
, PizzaVito
, Pizzalicious
. How can I fix the query to lose the match for unwanted fields? Any help with this would be really great.
When you index 'RestaurantName' as a text field, the "Standard" analyzer includes the lowercase filter, "lowercase" token filter makes fields case-insensitive, which means all tokens in lucene are lowercase.
first, you should add an extra keyword type to RestaurantName field.
{
"mappings": {
"properties": {
"RestaurantName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
search with wildcard,
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"RestaurantName.keyword": {
"value": "*Pizz*"
}
}
}
],
"must_not": [
{
"match": {
"RestaurantName": "pizza"
}
},
{
"match": {
"RestaurantName": "pizzeria"
}
}
]
}
}
}
the result is,
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "pizza",
"_type": "_doc",
"_id": "1L6ob4cB6Rdc8HbDY8vi",
"_score": 1.0,
"_source": {
"RestaurantName": "Fozzie's Pizzaiolo"
}
},
{
"_index": "pizza",
"_type": "_doc",
"_id": "1b6ob4cB6Rdc8HbDg8tA",
"_score": 1.0,
"_source": {
"RestaurantName": "PizzaVito"
}
},
{
"_index": "pizza",
"_type": "_doc",
"_id": "1r6ob4cB6Rdc8HbDmMuJ",
"_score": 1.0,
"_source": {
"RestaurantName": "Pizzalicious"
}
}
]
}
}