Search code examples
solrpaginationcursor

Solr cursor marker and pagination


I want to use Solr for my website as a search engine and I am trying to understand the difference between basic paging and deep paging with cursor marker.

As far as I understand, if you use the basic pagination and query the page 1001 with 20 results per page this will happen:

  • Solr will find the first 1000*20 matching results
  • display the next 20 results for 1001 page

I guess the problem is when someone clicks next page. Solr will find first the 1001*20 results and after that will show the desired results.

I haven't seen a proper example for deep paging with large numbers. Only with small numbers, so I am not sure about this. Can someone clarify it please?

Is the following example correct?

.../query?q=id:book*&sort=pubyear_i+desc,id+asc&fl=title_t,pubyear_i&rows=1&cursorMark=*

This giving me the "nextCursorMark" : "AoJcfCVib29rMg=="

Now that I have the nextCursorMark I can go and find my desired page. Should I now go through the pages manually? Should I create a loop where I search for that particular page I want?

Or should I have the first query with 20000 rows, get the nextCursorMark and then use it with another query having only 20 rows?

I find it a bit strange to run some query with 20000 rows just to get the nextCursorMark. Is it the correct way to do it?

And what if, for example you have 10 pages and the user wants to click on page 5 from page 1. Will I need to go through each page manually to get there?

Edit:

I have read this: How to manage "paging" with Solr?

And this: https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Tried to find a working example but couldn't.


Solution

  • The cursorMark tells Solr where it should start the next response. It's analogous to the start parameter in your first example. As you're paginating through the results, each response's cursorMark shows where the next page starts.

    If you're just looking for "what is the first result on page 1001", first version will work just fine. If you're paginating through the results - were a user may or may not go to the next page, the point about using cursorMarks is that each node (or in a single node setup) know which document was the last one to be shown, and thus, can return only rows number of documents from the current position for each node. If you'd do the first version, each node would have to return start + rows documents. So instead of trying to find out "which documents are the ten ones after 20001", you just need to answer "which documents are the next ten after this sort key".

    In addition cursorMarks handles updates to the result set better, as you avoid any changes to the result set that would push documents that have already been shown back into the next page you're displaying.

    See the reference guide for complete examples and further descriptions.