Search code examples
c#.netdictionarycollections.net-4.6.1

Replacement .net Dictionary


Given (Simplified description)

One of our services has a lot of instances in memory. About 85% are unique. We need a very fast key based access to these items as they are queried very often in a single stack / call. This single context is extremely performance optimized.

So we started to put them them into a dictionary. The performance was ok.

Access to the items as fast as possible is the most important thing in this case. It is ensured that there are no write operations when reads occur.

Problem

In the meanwhile we hit the limits of the number of items a dictionary can store.

Die Arraydimensionen haben den unterstützten Bereich überschritten. 
  bei System.Collections.Generic.Dictionary`2.Resize(Int32 newSize, Boolean forceNewHashCodes)
  bei System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)

Which translates to The array dimensions have exceeded the supported range.

Solutions like Memcached are in this specific case just too slow. It is a isolated very specific use case encapsulated in a single service

So we are looking for a replacement of the dictionary for this specific scenario.

Currently I can't find one supporting this. Am I missing something? Can someone point me to one?

As an alternative, if none exists we are thinking about implementing one by ourselves.

We thought about two possibilities. Build it up from scratch or wrapping multiple dictionaries.

Wrapping multiple dictionaries

When an item is searched we could have a look at the keys HasCode and use its starting number like an index for a list of wrappers dictionaries. Although this seems to be easy it smells to me and it would mean that the hashcode is calculated twice (one time by us one time by the inner dictionary) (this scenario is really really performance cruical).

I know that exchanging a basetype like the dictionary is the absolute last possibility and I want to avoid it. But currently it looks like there is no way to make the objects more unique or to get the performance of a dictionary from a database or to save performance somewhere else.

I'm also aware of "be aware of optimizations" but the a lower performance would very badly hit the business requirements behind it.


Solution

  • Before I finished reading your questions, the simple multiple dictionaries came to my mind. But you know this solution already. I am assuming you are really hitting the maximum number of items in a dictionary, not any other limit.

    I would say go for it. I do not think you should be worried about counting a hash twice. If they keys are somehow long and getting the hash is really a time consuming operations (which I doubt, but can't be sure as you did not mention what are the keys), you do not need to use whole keys for your hash function. Just pick up whatever part you are OK to process in your own hashing and distribute the item based on that.

    The only thing you need to make sure here is to have an evenly spread of items among your multiple dictionaries. How hard is to achieve this really depends on what your keys are. If they were completely random numbers, you could just use the first byte and it would be fine (unless you would need more than 256 dictionaries). If they are not random numbers, you have to think about the distribution in their domain and code your first hash function in a way it achieves that goal of even distribution.