Search code examples
mysqldatabasesolrsolrj

How to query SOLR with 98,000 ids without getting maxBooleanClause exception


Our Spring Java website's MySql database has got the following tables

  • CUSTOMER
  • PRODUCT
  • PRODUCT_CUSTOMER_XREF (Customers who bought a product)
  • CUSTOMER_CUSTOMER_XREF (Customers who are friends with other customer)

in SOLR we've indexed all PRODUCTS which have a unique id which is PRODUCT_ID.

My job

  • Display PRODUCTS current logged in CUSTOMER's friends bought in the past.
  • Let customer search for product descriptions and product reviews of Customers friends done in that past

Steps I used

  • SQL Query Get friends of current logged in CUSTOMER ( returns 500 CUSTOMERS)
  • JOIN SQL QUERY Get the PRODUCT(s) purchased by every friend CUSTOMER (returns 98,000 PRODUCTS_ID). This could possibly return 500,000
  • SOLR query id:(1 2 3 4 5 ....98000) returns maxBooleanClause exception.
  • MySQL Query - Get friends of Customer who bought or commented on the returns Product ids.

Problem - solr query

  • I cannot query for 98,000+ products in SOLR because this return maxBooleanClause exception. I can increase the maxBoolean in solr-config.xml file but this seems like a bad practice consider this number be in 100 of thousands.
  • I can add 98,000+ queries for the same field but this sounds like a bad practice aswell.
  • Costly to send 98,000+ ids to SOLR server over HTTP post.

Can someone please help me with a solution ?

Additional Note:I was thinking if there is a way I can group all CUSTOMER_IDs who bought or commented on a particular product into 1 number or token or some sort and then check if the logged in CUSTOMER's ID friends IDs belong to that group. But haven't figured out a formula yet.


Solution

  • I would try something like this ...

    Solr Index

    Person

    • Id
    • Name
    • Friends
    • Products_Purchased

    Products

    • Id
    • Title

      1. 1st Query - fq=friends:id&facet.field=Products_Purchased
      2. This would limit the results to only the Persons, the User is friend with
      3. Also the facet on the products purchased, would be Order by the Count of the Products
      4. This will also make sure you have unique ids for Products.

      5. 2nd Query - I would use these Ids to query solr probably in the batch of 20-50 max with pagination.

      6. Pagination is valid for facets as well, so you can Navigate the results.