Search code examples
javaelasticsearchelasticsearch-high-level-restclient

Elasticsearch document search by quafilication


My problem is very similar to this one: Elasticsearch : search document with conditional filter However, the data structure is a bit different for me, so I can't use the solution for the other thread. I have many documents indexed. There are, as we call it, qualifiers telling me if certain documents need to be shown or not. Here is the moment where my problems start. Here's the example:

{
    locale: en_US, 
    type: bla,
    qualifiers: [
         {
             criteria: [
                {
                    type: year_range,
                    lower: 2024,
                    upper: 2027
                },
                {
                    type: ids,
                    values: [1,20]
                }
                ,
                {
                    type: string_range_term,
                    lower: "123455",
                    upper: "zzzzzz"
                }
            ]
        },
        {
             criteria: [
                {
                    type: year_range,
                    lower: 2010,
                    upper: 2012
                }
            ]
        }
    ]
}

As an input, I need to provide all the parameters: year, ids, expiration date. The document needs to be matched in the following way:

  • the entire list of criteria for at least a single qualifier needs to match in order to return this document
  • the input params will always specify all possible values
  • if the criteria type exists - needs to match OR must be missing

examples:

  1. input year: 2011 id:10 term: aaaa
  • should match - because the second criteria contain year_range matching the input - other params are ignored as this criteria contains a single element
  1. input year: 2013 id: 1 term: zzzzzz
  • should not match - because the year does not match, despite of matches of id and terms
  1. input year: 2025 id: 20 term: zzzzzz
  • should match - because all criteria matches
  1. input: year: 2025 id: 50 term: zzzzzz
  • should not match as none of the IDs does not match.

I'll be grateful for any hints or advice as I've been struggling with that for 4 days now without promising results. I'm thinking if I should reorganize the data - as that's almost 1:1 with the database structure - where it works just fine... however, I need to speed the document selection a bit so I wanted to move them to ES. I'll need to implement that within the Java using the rest client... but having the proper query in place I'll be able to convert that into the Java code :) thank you in advance


Solution

  • You can use elasticsearch boolean query and create some AND logic inside of the OR logic. The field type must be nested because of the raw data you have. Here is how you can do it with must and should clauses.

    GET search_by_quafilication/_search
    {
      "query": {
        "nested": {
          "path": "qualifiers",
          "query": {
            "bool": {
              "should": [    <-- OR logic start
                {
                  "bool": {
                    "must": [  <-- first AND logic inside of OR
                      {}
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [  <-- second AND logic inside of OR
                      {}
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    }
    

    The full example:

    PUT search_by_quafilication
    {
      "mappings": {
        "properties": {
          "qualifiers": {
            "type": "nested"
          }
        }
      }
    }
    #check the mapping
    GET search_by_quafilication
    PUT search_by_quafilication/_doc/1
    {
      "qualifiers": {
        "year": 2011,
        "ids": 10,
        "string": "aaaa"
      }
    }
    
    PUT search_by_quafilication/_doc/2
    {
      "qualifiers": {
        "year": 2013,
        "ids": 1,
        "string": "zzzzzz"
      }
    }
    
    PUT search_by_quafilication/_doc/3
    {
      "qualifiers": {
        "year": 2025,
        "ids": 20,
        "string": "zzzzzz"
      }
    }
    
    PUT search_by_quafilication/_doc/4
    {
      "qualifiers": {
        "year": 2025,
        "ids": 50,
        "string": "zzzzzz"
      }
    }
    
    
    
    
    GET search_by_quafilication/_search
    {
      "query": {
        "nested": {
          "path": "qualifiers",
          "query": {
            "bool": {
              "should": [
                {
                  "bool": {
                    "must": [
                      {
                        "range": {
                          "qualifiers.year": {
                            "gte": 2024,
                            "lte": 2027
                          }
                        }
                      },
                      {
                        "regexp": {
                          "qualifiers.string": "[A-z]{6}"
                        }
                      },
                      {
                        "range": {
                          "qualifiers.ids": {
                            "gte": 1,
                            "lte": 20
                          }
                        }
                      }
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [
                      {
                        "range": {
                          "qualifiers.year": {
                            "gte": 2010,
                            "lte": 2012
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    }
    

    The ouptut will only hit the example you shared that is id:1 and id:3

    enter image description here