Service fabric reliable dictionary performance with 1 million keys

I am evaluating the performance of Service Fabric with a Reliable Dictionary of ~1 million keys. I'm getting fairly disappointing results, so I wanted to check if either my code or my expectations are wrong.

I have a dictionary initialized with dict = await _stateManager.GetOrAddAsync<IReliableDictionary2<string, string>>("test_"+id);

id is unique for each test run.

I populate it with a list of strings, like "1-1-1-1-1-1-1-1-1", "1-1-1-1-1-1-1-1-2", "1-1-1-1-1-1-1-1-3".... up to 576,000 items. The value in the dictionary is not used, I'm currently just using "1".

It takes about 3 minutes to add all the items to the dictionary. I have to split the transaction to 100,000 at a time, otherwise it seems to hang forever (is there a limit to the number of operations in a transaction before you need to CommitAsync()?)

//take100_000 is the next 100_000 in the original list of 576,000
using (var tx = _stateManager.CreateTransaction())
{
    foreach (var tick in take100_000) {
        await dict.AddAsync(tx, tick, "1");
    }
    await tx.CommitAsync();
}

After that, I need to iterate through the dictionary to visit each item:

using (var tx = _stateManager.CreateTransaction())
{

    var enumerator = (await dict.CreateEnumerableAsync(tx)).GetAsyncEnumerator();

    try
    {
        while (await enumerator.MoveNextAsync(ct))
        {
            var tick = enumerator.Current.Key;                
            //do something with tick                    
        }
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

This takes 16 seconds.

I'm not so concerned about the write time, I know it has to be replicated and persisted. But why does it take so long to read? 576,000 17-character string keys should be no more than 11.5mb in memory, and the values are only a single character and are ignored. Aren't Reliable Collections cached in ram? To iterate through a regular Dictionary of the same values takes 13ms.

I then called ContainsKeyAsync 576,000 times on an empty dictionary (in 1 transaction). This took 112 seconds. Trying this on probably any other data structure would take ~0 ms.

This is on a local 1 node cluster. I got similar results when deployed to Azure.

Are these results plausible? Any configuration I should check? Am I doing something wrong, or are my expectations wildly inaccurate? If so, is there something better suited to these requirements? (~1 million tiny keys, no values, persistent transactional updates)

Solution

Ok, for what it's worth:

Not everything is stored in memory. To support large Reliable Collections, some values are cached and some of them reside on disk, which potentially could lead to extra I/O while retrieving the data you request. I've heard a rumor that at some point we may get a chance to adjust the caching policy, but I don't think it has been implemented already.
You iterate through the data reading records one by one. IMHO, if you try to issue half a million separate sequential queries against any data source, the outcome won't be much optimistic. I'm not saying that every single MoveNext() results in a separate I/O operation, but I'd say that overall it doesn't look like a single fetch.
It depends on the resources you have. For instance, trying to reproduce your case on my local machine with a single partition and three replicas, I get the records in 5 seconds average.

Thinking about a workaround, here is what comes in mind:

Chunking I've tried to do the same stuff splitting records into string arrays capped with 10 elements(IReliableDictionary< string, string[] >). So essentially it was the same amount of data, but the time range was reduced from 5sec down to 7ms. I guess if you keep your items below 80KB thus reducing the amount of round-trips and keeping LOH small, you should see your performance improved.
Filtering CreateEnumerableAsync has an overload that allows you to specify a delegate to avoid retrieving values from the disk for keys that do not match the filter.
State Serializer In case you go beyond simple strings, you could develop your own Serializer and try to reduce the incurred I/O against your type.

Hopefully it makes sense.