Search code examples
c#dictionarypreserveinsertion-order

Does Dictionary<string, int> preserves the insertion order?


I have been using C# for more than 10 years, but today I discovered that Dictionary<string, int> retains the insertion order? As far as I know, Dictionary is implemented internally using a hash table, so the internal arrangement should be in no particular order, but the test code below seems to show that the order during traversal is exactly the same as the order of insertion?

        const int N = 1000000;

        List<string> ss = new List<string>();
        Dictionary<string, int> dict = new Dictionary<string, int>();

        for (int i = 0; i < N; ++i)
        {
            var s = string.Empty;
            for (int j = 0; j < 15; ++j)
                s += (char)('a' + UnityEngine.Random.Range(0, 26));

            //ss.add(s);
            ss.Add(new string(s));

            //dict[s] = UnityEngine.Random.Range(0, 1024);
            dict.Add(s, UnityEngine.Random.Range(0, 1024));
        }

        int same = 0;

        var keys = dict.Keys.ToArray();

        for (int i = 0; i < N; ++i)
        {
            if (keys[i] == ss[i])
                ++same;
        }

        Debug.Log($"N={N}, same={same}");
        
        // the output is : N=1000000, same=1000000

Now I wonder if there is something wrong with my test code?

Can anyone confirm that the traversal order of Dictionary in C# is the insertion order?

Is it reliable?


Solution

  • I looked at the implementation of Dictionary<TKey, TValue> (by clicking Alt-F12 with the text cursor on Dictionary<string, int> in Visual Studio). I do not understand it fully; however, I noticed that two arrays are used to store the entries:

    private int[]? _buckets;
    private Entry[]? _entries;
    

    The _buckets are indeed accessed through the hash code. I.e., this is the hash table. By debugging I noticed that the entries are added in a random order into _buckets but in a sequential order into _entries. _buckets contains an index into _entries. This explains why the entries can be retrieved in the same order as they have been added to the dictionary.

    This behavior is due to implementation detail. So, I wouldn't rely on it as the implementation could change in future releases. In fact, the implementation of Dictionary<TKey, TValue> has changed in the past. Also, I don't know if this order is always maintained, or if there are conditions where it might be broken.


    Test setup: .NET 8.0 Console Application / C# 12

    public static void Test()
    {
        const int N = 1000000;
    
        List<string> list = [];
        Dictionary<string, int> dict = [];
        Type type = dict.GetType();
    
        for (int i = 0; i < N; ++i) {
            string s = String.Empty;
            for (int j = 0; j < 4; ++j) {
                s += (char)('a' + Random.Shared.Next(26));
            }
            list.Add(s);
            dict[s] = i; // Adding i simplifies analysis, as _entries[i].value now reflects the order.
    
            // Retrieve the dictionary's private fields at every iteration, since they are reallocated when they are resized.
            int[]? _buckets = (int[]?)type.GetField("_buckets", BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(dict);
            object? _entries = type.GetField("_entries", BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(dict);
        } // <<< place a breakpoint here and inspect _buckets and _entries.
    
        int same = 0;
        string[] keys = dict.Keys.ToArray();
        for (int i = 0; i < keys.Length; ++i) {
            if (keys[i] == list[i]) {
                ++same;
            }
        }
    
        Console.WriteLine($"N={keys.Length}, same={same}");
    }