Search code examples
elasticsearchboolean-logicboolean-expression

Distributivity of 'must' over 'should' in elasticsearch queries


I felt puzzled by this behavior while querying my index.

Whether you interpret it in a boolean fashion or as sets (OR being a union and AND being an intersection), I take for granted that X AND (Y OR Z) = (X AND Y) OR (X AND Z). In the following examples,

  • X AnneeConstructionLogement < 1960
  • Y ResultatGlobalAmiante = true
  • Z TypeDiagnosticAmiante = "DAT"

X AND (Y OR Z)

{
 "query": {
    "bool": {
      "must": [
        {
          "range": {
            "AnneeConstructionLogement.keyword": {
              "lt": 1960
            }
          }
        },
        {
          "bool": {
            "should": [
                {"term": {
                    "ResultatGlobalAmiante.keyword": true
                }},
                {"term": {
                    "TypeDiagnosticAmiante.keyword": "DAT"
                }}
              ]
          }
        }
      ]
 }
}

gives me 37 hits

(X AND Y) OR (X AND Z)

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "ResultatGlobalAmiante": true
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement.keyword": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "TypeDiagnosticAmiante.keyword": "DAT"
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement.keyword": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

gives me 102 hits which I find surprising, for both are logically equivalent (or, at least, I do not see any difference between those). Even more surprising, the KQL I started from _index : ace-logement and AnneeConstructionLogement <= "1960" and ResultatGlobalAmiante: true or _index : ace-logement and AnneeConstructionLogement <= "1960" and TypeDiagnosticAmiante: DAT gives me 134 hits

Is this transposition of must and should on AND and OR relevant? Is this mismatch logic or implementation related?


Solution

  • The problem came from the use of .keyword (not sure why but interested to know). Thanks, @ilvar. I finally got the same number of hits

    {
     "query": {
        "bool": {
          "must": [
            {
              "range": {
                "AnneeConstructionLogement": {
                  "lt": 1960
                }
              }
            },
            {
              "bool": {
                "should": [
                    {"term": {
                        "ResultatGlobalAmiante": true
                    }},
                    {"term": {
                        "TypeDiagnosticAmiante": "DAT"
                    }}
                  ]
              }
            }
          ]
     }
    }
    
    
    {
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "ResultatGlobalAmiante": true
                    }
                  },
                  {
                    "range": {
                      "AnneeConstructionLogement": {
                        "lt": 1960
                      }
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "TypeDiagnosticAmiante": "DAT"
                    }
                  },
                  {
                    "range": {
                      "AnneeConstructionLogement": {
                        "lt": 1960
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }