In reliable collections (specifically IReliableDictionary), an approach for implementing 'common' queries is to update a secondary dictionary which structures the keys to be ordered a specific way in an enumeration. For large data sets, I would like to avoid shuttling a large amount of data around.
To achieve this I would like to implement some sort of continuation token which the caller can supply to me when requesting the data. I am currently implementing this by first generating an ordered enumeration and returning the first n items where n = the MAX_PAGE size. The continuation is essentially the last key in that list of n items. The next time the caller passes in the continuation token, I generate the ordered enumerable with the filter function specifying that the key should be greater than the continuation.
This has 2 problems (that I can see):
Are there any prescribed approaches for paging data like this? I am beginning to feel like I might be barking up the wrong tree with Reliable Collections for some of my use cases.
One way to build secondary indicies is to use Notifications. Using notifications with a reference type TKey & TValue, you can maintain a secondary index without creating any copies of your TKey or TValue.
If you need the secondary index to provide snapshot isolation, then the data structure chosen for the secondary index must implement Multi-Version Concurrency Control.
If you do not have such a data structure to host the secondary index, another option is to keep the transaction and the enumeration live across the paged client calls. This way you can use Reliable Dictionary's built-in snapshot support to provide a paged consistent scan over the data without blocking writes. Token in this case would be the TransactionId allowing your service to find the relevant enumeration to MoveNextAsync on. The disadvantage of using this option is that Reliable Dictionary will not be able to trim old versions of the values that are kept visible by the potentially long running snapshot transactions.
To mitigate the above disadvantage, you would probably want to throttle the number of in-flight snapshot transactions and how long a client has to complete the paged enumeration before your service disposes the enumeration and the relevant read transaction.
When CreateEnumerableAsync with a key filter is used, Reliable Dictionary will invoke the filter for every key to see if it satisfies the custom filter. Since TKeys are always kept in-memory today, for most key filters we have not seen issues here. The most expensive part of an enumeration tends to be retrieving paged out values from disk.