Search code examples
elasticsearchsolrluceneelastic-stackrecommendation-engine

How to exclude a large set of of ids from elasticsearch result?


I have a lot of Products indexed in elasticsearch. I need to exclude a list of ids (that I am fetching from a SQL database), from a query in elasticsearch. Suppose Products are stored as,

{
  "id" : "1",
  "name" : "shirt",
  "size" : "xl"
}

We show a list of recommended products to a customer based on some algorithm using elasticsearch. If a customer marks a product as 'Not Interested', we don't have to show him that product again. We keep such products in a separate SQL table with product_id, customer_id and status 'not_interested'.

Now while fetching recommendations for a customer on runtime, we get the list of 'not_interested' products from the SQL database, and send the array of product_ids in a not filter in elasticsearch to exclude them from recommendation. But the problem arises, when the size of product_ids array becomes too large.

How should I store the product_id and customer_id mapping in elasticsearch to filter out the 'not_interested' products on runtime using elasticsearch only?

Will it make sense to store them as nested objects or parent/child documents.? Or some completely other way to store such that I can exclude some ids from the result efficiently.


Solution

  • You can exclude IDs (or any other literal strings) efficiently using a terms query.

    Both Elasticsearch and Solr have this. It is very powerful and very efficient.

    Elasticsearch has this with the IDS query. This query is in fact a terms query on the _uid field. Make sure you use this query in a mustNot clause within a bool query. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html

    In Solr you can use the terms query within a fq like fq=-{!terms f=id}doc334,doc125,doc777,doc321,doc253. Note the minus to indicate that it is a negation. See: http://yonik.com/solr-terms-query/