.Net collection performance for special use case

I am looking for a good starting point for my use case. My use case is somewhat special so a simple performance test with all the collections available will not do it. Performance is everything. Memory consumption can be as high as needed, I do not care.

Note: I already tried a few things with SortedDictionary instances. But I am wondering if some other collections would be a better fit. I also will implement a solution myself if there is a superior algorithm for handling these things, so there is no need to stick to existing implementations.

The use case is. There is some data structure in the form {BigInterger key1, Guid key2, BigInteger quantity}. The Guid is not random but generated so that each new Guid is of higher value than the last one.

Note: I started with a SortedDictionary using key1 as key and holding an other SortedDictionary using key2 as key, but I came to the conclusion that maybe a more flat data structure with only one SortedDictonary using key1+key2 would be better.

The requirements are

Inserts can be random, however most inserts will happen in the top 100 items sorted based on the key. Inserts are always single items.
Remove of items will always remove the top n items first. So from a remove point of view the collection is like a stack. Remove always removes 1 to n items.
Lookups are not random but will return all top n items with keys lower or equal the lookup key
Lookup should also filter based on the quantity. So the lookup will take a quantity parameter. Items are return as long as the quantities of all items return does not exceed the lookup quantity. In addition, all returned items will be removed from the collection.
An Insert/Remove operation is always happening after a lookup operation. Possible operation chains are (lookup [15%], lookup->insert [15%], lookup->remove [15%], lookup->remove->insert [55%]). The probabilities are guesses there is no real way to tell beforehand what will happen more often.
The collection will be handling between 1 and 100 million items. Maybe even more.

Note: I did also thought about using a linked list. maybe they are better suited for this than a SortedDictionary.

What would be your approach for this.

Note: Current State: Comparing performance between 'SortedDictionary<BigInteger, SortedDictionary<Guid, OrderData>>' and SortedDictionary<MergedKey, OrderData>

Results so far

Solution	Number of Items	Inserts / second
SortedDictionary with embedded SortedDictionary	1	847
SortedDictionary with embedded SortedDictionary	1M	2000000
SortedDictionary with embedded SortedDictionary	10M	87719
SortedDictionary with embedded SortedDictionary	100M	4677
ImmutableSortedDictionary with embedded ImmutableSortedDictionary	1	410
ImmutableSortedDictionary with embedded ImmutableSortedDictionary	1M	666666
ImmutableSortedDictionary with embedded ImmutableSortedDictionary	10M	20491
ImmutableSortedDictionary with embedded ImmutableSortedDictionary	100M	1406
SortedDictionary with CombinedKey	1	1288
SortedDictionary with CombinedKey	1M	769230
SortedDictionary with CombinedKey	10M	44444
SortedDictionary with CombinedKey	100M	2557
ImmutableSortedDictionary with CombinedKey	1	2105
ImmutableSortedDictionary with CombinedKey	1M	370370
ImmutableSortedDictionary with CombinedKey	10M	14204
ImmutableSortedDictionary with CombinedKey	100M	1047

Solution

Using a sorted array

Lets consider the algorithmic complexity of your operations using a sorted (dynamic) array. Lets also assume there is a single key used for sorting and lookup.

Inserts can be random, however most inserts will happen in the top 100 items sorted based on the key. Inserts are always single items.

So the typical case require moving ~50 items, i.e. O(1), and I would not worry to much of the time constant for 50 items. Worst case would be O(n). There is also the reallocation penalty for dynamic arrays, but this can be avoided by setting a large enough initial capacity.

Remove of items will always remove the top n items first. So from a remove point of view the collection is like a stack. Remove always removes 1 to n items.

This should take a constant time i.e. O(1), since you would only be modifying the count property.

Lookups are not random but will return all top n items with keys lower or equal the lookup key

This would mean a binary search to find the largest key lower than the lookup key, O(log n), and then it should be trivial to return a span for the given range.

Lookup should also filter based on the quantity. So the lookup will take a quantity parameter. Items are return as long as the quantities of all items return does not exceed the lookup quantity. In addition, all returned items will be removed from the collection.

If I understand this correctly you still do a binary search as above, followed by iterating values until the sum of the quantities equal the query value. This should take O(log n + m) where m is the number of items returned.

Conclusion

Most operations should be O(log n) or better, and this is quite good. Even if there are data structures supporting O(1) you need to consider the time constant. Actual runtime may greatly depend on efficient cache usage.

As far as I know there is no builtin collection that supports all of these operations efficiently out of the box. So I would probably just roll my own using a List<T> or a plain array as the backing storage, there is existing binary search methods for both List and Array that could be used. Array allow for lower level access, but require you to handle increases in size yourself if that is needed.

If profiling suggest that inserts take to much time, due to the O(n) worst case), you might want to look at something like a skip list, or unrolled linked list. But I have not enough experience with these to suggest anything specific.

Other notes

I would try to stay away from BigInt if performance is a concern. For values less than 32 bits a BigInt will use 12 bytes. For values in the 32-64 bit range it will use 36 bytes. BigInt are also slower, since operations will involve loops and branches. So if you can use fixed size types like int64 or int128 it might greatly help with cache usage and performance. If values larger than int128 exist but are rare it might be better to handle these as a special case.

I would probably recommend using a struct instead of class, to reduce memory usage and indirection. You can use Span, in parameters etc to reduce copy overhead.

I would recommend Benchmark .net for evaluation. This should help ensure good measurements. Just make sure to write representative test cases.