Search code examples
azuremicroservicesazure-service-fabricservice-fabric-stateful

Is there an established pattern for paging in Service Fabric ReliableCollections


In reliable collections (specifically IReliableDictionary), an approach for implementing 'common' queries is to update a secondary dictionary which structures the keys to be ordered a specific way in an enumeration. For large data sets, I would like to avoid shuttling a large amount of data around.

To achieve this I would like to implement some sort of continuation token which the caller can supply to me when requesting the data. I am currently implementing this by first generating an ordered enumeration and returning the first n items where n = the MAX_PAGE size. The continuation is essentially the last key in that list of n items. The next time the caller passes in the continuation token, I generate the ordered enumerable with the filter function specifying that the key should be greater than the continuation.

This has 2 problems (that I can see):

  1. The collection could change between when the caller first requests a page and a subsequent request. This, I'm not certain I can avoid since updates to the collection need to be able to occur at any time regardless of who is attempting to page through the data.
  2. I'm not certain how the filter function is used. I would assume that since a developer could filter on anything, the GetEnumerableAsync() method must supply all keys in the dictionary before returning the enumerable. For a sufficiently large data set, this seems slow.

Are there any prescribed approaches for paging data like this? I am beginning to feel like I might be barking up the wrong tree with Reliable Collections for some of my use cases.


Solution

  • One way to build secondary indicies is to use Notifications. Using notifications with a reference type TKey & TValue, you can maintain a secondary index without creating any copies of your TKey or TValue.

    If you need the secondary index to provide snapshot isolation, then the data structure chosen for the secondary index must implement Multi-Version Concurrency Control.

    If you do not have such a data structure to host the secondary index, another option is to keep the transaction and the enumeration live across the paged client calls. This way you can use Reliable Dictionary's built-in snapshot support to provide a paged consistent scan over the data without blocking writes. Token in this case would be the TransactionId allowing your service to find the relevant enumeration to MoveNextAsync on. The disadvantage of using this option is that Reliable Dictionary will not be able to trim old versions of the values that are kept visible by the potentially long running snapshot transactions.

    To mitigate the above disadvantage, you would probably want to throttle the number of in-flight snapshot transactions and how long a client has to complete the paged enumeration before your service disposes the enumeration and the relevant read transaction.

    When CreateEnumerableAsync with a key filter is used, Reliable Dictionary will invoke the filter for every key to see if it satisfies the custom filter. Since TKeys are always kept in-memory today, for most key filters we have not seen issues here. The most expensive part of an enumeration tends to be retrieving paged out values from disk.