Search code examples
azure-service-fabric

Service fabric reliable dictionary performance with 1 million keys


I am evaluating the performance of Service Fabric with a Reliable Dictionary of ~1 million keys. I'm getting fairly disappointing results, so I wanted to check if either my code or my expectations are wrong.

I have a dictionary initialized with dict = await _stateManager.GetOrAddAsync<IReliableDictionary2<string, string>>("test_"+id);

id is unique for each test run.

I populate it with a list of strings, like "1-1-1-1-1-1-1-1-1", "1-1-1-1-1-1-1-1-2", "1-1-1-1-1-1-1-1-3".... up to 576,000 items. The value in the dictionary is not used, I'm currently just using "1".

It takes about 3 minutes to add all the items to the dictionary. I have to split the transaction to 100,000 at a time, otherwise it seems to hang forever (is there a limit to the number of operations in a transaction before you need to CommitAsync()?)

//take100_000 is the next 100_000 in the original list of 576,000
using (var tx = _stateManager.CreateTransaction())
{
    foreach (var tick in take100_000) {
        await dict.AddAsync(tx, tick, "1");
    }
    await tx.CommitAsync();
}

After that, I need to iterate through the dictionary to visit each item:

using (var tx = _stateManager.CreateTransaction())
{

    var enumerator = (await dict.CreateEnumerableAsync(tx)).GetAsyncEnumerator();

    try
    {
        while (await enumerator.MoveNextAsync(ct))
        {
            var tick = enumerator.Current.Key;                
            //do something with tick                    
        }
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

This takes 16 seconds.

I'm not so concerned about the write time, I know it has to be replicated and persisted. But why does it take so long to read? 576,000 17-character string keys should be no more than 11.5mb in memory, and the values are only a single character and are ignored. Aren't Reliable Collections cached in ram? To iterate through a regular Dictionary of the same values takes 13ms.

I then called ContainsKeyAsync 576,000 times on an empty dictionary (in 1 transaction). This took 112 seconds. Trying this on probably any other data structure would take ~0 ms.

This is on a local 1 node cluster. I got similar results when deployed to Azure.

Are these results plausible? Any configuration I should check? Am I doing something wrong, or are my expectations wildly inaccurate? If so, is there something better suited to these requirements? (~1 million tiny keys, no values, persistent transactional updates)


Solution

  • Ok, for what it's worth:

    • Not everything is stored in memory. To support large Reliable Collections, some values are cached and some of them reside on disk, which potentially could lead to extra I/O while retrieving the data you request. I've heard a rumor that at some point we may get a chance to adjust the caching policy, but I don't think it has been implemented already.

    • You iterate through the data reading records one by one. IMHO, if you try to issue half a million separate sequential queries against any data source, the outcome won't be much optimistic. I'm not saying that every single MoveNext() results in a separate I/O operation, but I'd say that overall it doesn't look like a single fetch.

    • It depends on the resources you have. For instance, trying to reproduce your case on my local machine with a single partition and three replicas, I get the records in 5 seconds average.

    Thinking about a workaround, here is what comes in mind:

    • Chunking I've tried to do the same stuff splitting records into string arrays capped with 10 elements(IReliableDictionary< string, string[] >). So essentially it was the same amount of data, but the time range was reduced from 5sec down to 7ms. I guess if you keep your items below 80KB thus reducing the amount of round-trips and keeping LOH small, you should see your performance improved.

    • Filtering CreateEnumerableAsync has an overload that allows you to specify a delegate to avoid retrieving values from the disk for keys that do not match the filter.

    • State Serializer In case you go beyond simple strings, you could develop your own Serializer and try to reduce the incurred I/O against your type.

    Hopefully it makes sense.