Search code examples
elasticsearchsearchlogstashkibanafuzzy-search

Need help to write elasticsearch query which should search based on text of one field and another array field with (passed values or null)


I have data in the mysql database which have been stored into elasticsearch as documents using logstash job.

**Data example in database:**

firstname(text), lastname(text), email(text), tags(text & nullable)

Here, tags conains value as per this in mysql : "t1,t2,t3" or ""

While adding data into elasticsearch, custom analyzer/tokenizer has been applied using ",(comma)".

I need data as per this and need to write the query for this:

firstname = "text", tags = ["t1","t2"]

"Get all records which contain this firstname and tags with either "t1 or t2 or (t1 & t2) or null" (any in which tags are empty)".

I have tried a few queries to get this kind of result but nothing worked. (It doesn't return records with null tags as well tags with (t1,t2))

GET /posts/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "some_text",
                  "fields": [
                    "firstname^1.0"
                  ]
                }
              },
              {
                "bool": {
                  "should": [
                    {
                      "terms": {
                        "tags": [
                          "t2"
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "some_text",
                  "fields": [
                    "firstname^1.0"
                  ]
                }
              },
              {
                "bool": {
                  "should": [
                    {
                      "terms": {
                        "tags": [
                          ""
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}
GET /posts/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "some_text",
                  "fields": [
                    "firstname^1.0"
                  ]
                }
              },
              {
                "bool": {
                  "should": [
                    {
                      "terms": {
                        "tags": [
                          "t2"
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "some_text",
                  "fields": [
                    "firstname^1.0"
                  ]
                }
              },
              {
                "bool": {
                   "must_not": {
                      "exists": {
                          "field": "tags"
                       }
                    }                
                 }
              }
            ]
          }
        }
      ]
    }
  }
}

Solution

  • Place the null check (nust_not -> exists) inside the should clause along with the terms query for tags.

    {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "some_text",
                "fields": [
                  "firstname^1.0"
                ]
              }
            }
          ],
          "should": [
            {
              "terms": {
                "tags": [
                  "t1",
                  "t2",
                  ""
                ]
              }
            },
            {
              "bool": {
                "must_not": {
                  "exists": {
                    "field": "tags"
                  }
                }
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    }
    

    OR

    {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "some_text",
                "fields": [
                  "firstname^1.0"
                ]
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "terms": {
                      "tags": [
                        "t1",
                        "t2",
                        ""
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must_not": {
                        "exists": {
                          "field": "tags"
                        }
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }