Search code examples
elasticsearchelasticsearch-6

Filter document on items in an array ElasticSearch


I am using ElasticSearch to search through documents. However, I need to make sure the current user is able to see those documents. Each document is tied to a community, in which the user may belong.

Here is the mapping for my Document:

export const mapping = {
  properties: {
    amazonId: { type: 'text' },
    title: { type: 'text' },
    subtitle: { type: 'text' },
    description: { type: 'text' },
    createdAt: { type: 'date' },
    updatedAt: { type: 'date' },
    published: { type: 'boolean' },
    communities: { type: 'nested' }
  }
}

I'm currently saving the ids of the communities the document belongs to in an array of strings. Ex: ["edd05cd0-0a49-4676-86f4-2db913235371", "672916cf-ee32-4bed-a60f-9a7c08dba04b"]

Currently, when I filter a query with {term: { communities: community.id } }, it returns all the documents, regardless of the communities it's tied to.

Here's the full query:

{
  index: 'document',
  filter_path: { filter: {term: { communities: community.id } } },
  body: {
    sort: [{ createdAt: { order: 'asc' } }]
  }
}

This is the following result based on the community id of "b7d28e7f-7534-406a-981e-ddf147b5015a". NOTE: This is a return from my graphql, so the communities on the document are actual full objects after resolving the hits from the ES query.

"hits": [
    {
      "title": "The One True Document",
      "communities": [
        {
          "id": "edd05cd0-0a49-4676-86f4-2db913235371"
        },
        {
          "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
        }
      ]
    },
    {
      "title": "Boring Document 1",
      "communities": []
    },
    {
      "title": "Boring Document 2",
      "communities": []
    },
    {
      "title": "Unpublished",
      "communities": [
        {
          "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
        }
       ]
    }
]

When I attempt to map the communities as {type: 'keyword', index: 'not_analyzed'} I receive an error that states, [illegal_argument_exception] Could not convert [communities.index] to boolean.

So do I need to change my mapping, my filter, or both? Searching around the docs for 6.6, I see that terms needs the non_analyzed mapping.

UPDATE --------------------------

I updated the communities mapping to be a keyword as suggested below. However, I still received the same result.

I updated my query to the following (using a community id that has documents):

query: { index: 'document',
  body: 
   { sort: [ { createdAt: { order: 'asc' } } ],
     from: 0,
     size: 5,
     query: 
      { bool: 
         { filter: 
            { term: { communities: '672916cf-ee32-4bed-a60f-9a7c08dba04b' } } } } } }

Which gives me the following results:

{
  "data": {
    "communities": [
      {
        "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b",
        "feed": {
          "documents": {
            "hits": []
          }
        }
      }
    ]
  }
}

Appears that my filter is working too well?


Solution

  • Since you are storing ids of communities you should make sure that the ids doesn't get analysed. For this communities should be of type keyword. Second you want to store array of community ids since a user can belong to multiple communities. To do this you don't need to make it of type nested. Nested has all together different use case. To sore values as array you need to make sure that while indexing you are always passing the values against the field as array even if the value is single value.

    You need to change mapping and the way you are indexing values against field communities.

    1. Update mapping as below:
    PUT my_index
    {
      "mappings": {
        "_doc": {
          "properties": {
            "amazonId": {
              "type": "text"
            },
            "title": {
              "type": "text"
            },
            "subtitle": {
              "type": "text"
            },
            "description": {
              "type": "text"
            },
            "createdAt": {
              "type": "date"
            },
            "updatedAt": {
              "type": "date"
            },
            "published": {
              "type": "boolean"
            },
            "communities": {
              "type": "keyword"
            }
          }
        }
      }
    }
    
    2. Adding a document to index:
    PUT my_index/_doc/1
    {
      "title": "The One True Document",
      "communities": [
        "edd05cd0-0a49-4676-86f4-2db913235371",
        "672916cf-ee32-4bed-a60f-9a7c08dba04b"
      ]
    }
    
    3. Filtering by community id:
    GET my_index/_doc/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "communities": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
              }
            }
          ]
        }
      }
    }
    

    Nested Field approach

    1. Mapping:
    PUT my_index_2
    {
      "mappings": {
        "_doc": {
          "properties": {
            "amazonId": {
              "type": "text"
            },
            "title": {
              "type": "text"
            },
            "subtitle": {
              "type": "text"
            },
            "description": {
              "type": "text"
            },
            "createdAt": {
              "type": "date"
            },
            "updatedAt": {
              "type": "date"
            },
            "published": {
              "type": "boolean"
            },
            "communities": {
              "type": "nested"
            }
          }
        }
      }
    }
    
    2. Indexing document:
    PUT my_index_2/_doc/1
    {
      "title": "The One True Document",
      "communities": [
        {
          "id": "edd05cd0-0a49-4676-86f4-2db913235371"
        },
        {
          "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
        }
      ]
    }
    
    3. Querying (used of nested query):
    GET my_index_2/_doc/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "nested": {
                "path": "communities",
                "query": {
                  "term": {
                    "communities.id.keyword": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
                  }
                }
              }
            }
          ]
        }
      }
    }
    

    You might be noticing I used communities.id.keyword and not communities.id. To understand the reason for this go through this.