Search code examples
elasticsearchbooleanquery

Nested documents and boolean query with Elasticsearch


I'm trying to use a must_not boolean query on nested documents but I keep getting weird results.

Here is an example to illustrate my issue.

curl -X DELETE "http://localhost:9200/must_again/"
curl -X POST "http://localhost:9200/must_again/" -d '{
  "mappings": {
    "class": {
      "properties": {
        "title": {
          "type": "string"
        },
        "teachers": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/must_again/class/1' -d '{
  "title": "class1",
  "teachers": [
    {
      "name": "alex"
    },
    {
      "name": "steve"
    }
  ]
}'

curl -XPUT 'http://localhost:9200/must_again/class/2' -d '{
  "title": "class2",
  "teachers": [
    {
      "name": "alex"
    }
  ]
}'

curl -XPUT 'http://localhost:9200/must_again/class/3' -d '{
  "title": "class3",
  "teachers": []
}'

At this point, I have 3 classes where only where steve is teaching, and one where there is no teacher.

My goal is get the last 2, every class where Steve is not teaching.

The query I was working with is

curl -XGET 'http://localhost:9200/must_again/class/_search' -d '{
  "query": {
    "nested": {
      "path": "teachers",
      "query": {
        "bool": {
          "must_not": [
            {
              "match": {
                "teachers.name": "steve"
              }
            }
          ]
        }
      }
    }
  }
}'

This returns

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "must_again",
        "_type": "class",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "title": "class2",
          "teachers": [
            {
              "name": "alex"
            }
          ]
        }
      },
      {
        "_index": "must_again",
        "_type": "class",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "title": "class1",
          "teachers": [
            {
              "name": "alex"
            },
            {
              "name": "steve"
            }
          ]
        }
      }
    ]
  }
}

So class2 is expected but not class1 and class3 is missing.

If I do the same query with must I do get the right result (only class1).

Not sure what I'm doing wrong?


Solution

  • A wayaround.

    curl -XPOST "http://localhost:9200/must_again/class/_search" -d'
    {
       "query": {
          "bool": {
             "must_not": [
                {
                   "nested": {
                      "path": "teachers",
                      "query": {
                         "bool": {
                            "must": [
                               {
                                  "match": {
                                     "teachers.name": "steve"
                                  }
                               }
                            ]
                         }
                      }
                   }
                }
             ]
          }
       }
    }'
    

    Hope this helps!!