Search code examples
elasticsearchelasticsearch-5elasticsearch-aggregationelasticsearch-dsl

Elasticsearch multi index filter


I have 2 indexes, one that stores users and one that stores articles.

User document:

{
  "id" : "152ce52d-e975-4ebd-849a-0a12f535e644",
  "email": "bob@email.com",
  ...
}

Index mapping:

{
  "mapping": {
    "properties": {
      "id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "email": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Article document:

{
  "articleId" : "002ce52d-e975-4ebd-849a-0a12f535a536",
  "userId": "152ce52d-e975-4ebd-849a-0a12f535e644",
  ...
}

Index mapping:

{
  "mapping": {
    "properties": {
      "articleId": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "userId": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

I'm trying to create a multi index query that returns all the users that have no article.

I can't figure out how to exclude from the first index items contained in the second index and having the same id/userId field.

Let's say I have the following users and articles:

Users:

 [
        {
            "id": "user1",
            "email": "bob@email.com"
        },
        {
            "id": "user2",
            "email": "john@email.com"
        },
        {
            "id": "user3",
            "email": "tom@email.com"
        },
        {
            "id": "user4",
            "email": "ben@email.com"
        }
    ]

Articles:

[
    {
        "articleId": "id1",
        "userId": "user1"
    },
    {
        "articleId": "id1",
        "userId": "user2"
    }
]

As a result of the query I want to have all the users that have no articles:

      [{
            "id": "user3",
            "email": "tom@email.com"
        },
        {
            "id": "user4",
            "email": "ben@email.com"
      }]

Solution

  • As @leandrojmp pointed out, you are trying to achieve something like a JOIN Query in SQL where you want to returns all the remaining records from one index, that did not match records from the second index. Refer to this documentation, to know more about joining queries in elasticsearch

    I can't figure out how to exclude from the first index items contained in the second index and having the same id/userId field.

    You can use _index field when performing queries across multiple indexes.

    Search Query:

    {
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must_not": [
                  {
                    "match": {
                      "id": "user1"
                    }
                  },
                  {
                    "match": {
                      "id": "user2"
                    }
                  }
                ],
                "must": {
                  "term": {
                    "_index": "64469393user"
                  }
                }
              }
            },
           {
              "bool": {
                "must_not": [
                  {
                    "match": {
                      "userId": "user1"
                    }
                  },
                  {
                    "match": {
                      "userId": "user2"
                    }
                  }
                ],
                "must": {
                  "term": {
                    "_index": "64469393article"
                  }
                }
              }
            }
          ]
        }
      }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "64469393user",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "id": "user3",
              "email": "tom@email.com"
            }
          },
          {
            "_index": "64469393user",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "id": "user4",
              "email": "ben@email.com"
            }
          }
        ]