Search code examples
c#.netcomplexity-theorybig-ogeneric-collections

Dictionary<Tkey, TValue>, List<T> and other collections implementation / runtime


I was wondering if there is any good reference (website or even better, a book) where I can find information about the internal implementation of the commonly used collections like

  • Dictionary<TKey, TValue>
  • List<T>
  • Queue<T>
  • Stack<T>
  • etc.

By internal implementation, I mean how they use a dynamic array to store their data, how often do they resize it, what is the time and space complexity for the common operations.

Of course, if anybody feels he can provide this information in this thread, you are more than welcome!


Solution

  • About List<T>:

    List<T> has two properties, Capacity and Count that help clarify when the resizing takes place. Capacity is the length of the internal array at any given time and Count is the number of elements added in the List. If you have an estimate of the number of elements to be added in the list, Capacity can be initialized (by choosing the appropriate constructor), which will lead to fewer or no resizings, and thus better performance.

    Resizing (i.e. creation of a new larger array and copy of elements one by one to the new array) happens when Add<T>() method is called and the array is already full (Count == Capacity). The Capacity of the new array is doubled (initially, if not set by the user, it starts from 0, then 4 and then it gets doubled each time more space needed):

    List<int> list = new List<int>();
    //Capacity = 0, Count = 0
    
    list.Add(52);
    //Capacity = 4, Count = 1
    
    list.Add(34);
    list.Add(2);
    list.Add(87);
    //Capacity = 4, Count = 4
    
    list.Add(56);
    //Capacity = 8, Count = 5
    

    For large n, the time complexity for adding a new element is amortized constant O(1). The lookup by index is constant O(1), and the insertion or removal of an element at a given index is linear O(n) because it involves shifting the rest of elements one position (to the right or left, respectively). The space used for the internal array is of course linear to the elements of the array, varying from n to 2n (or, if this makes sense: Math.Pow(2, Math.Ceiling(Math.Log(n, 2))) :).

    About Queue<T> and Stack<T>:

    The resizing of Queue's and Stack's internal array works similarly to what described for the List<T>.The common operations are efficient O(1) (internally, indexes are kept for Queue's head and tail elements). So, enqueuing an element in the queue or pushing in the stack needs amortized constant time, dequeuing/poping needs constant time.

    About Dictionary<TKey, TValue>:

    Dictionary works differently, and it is nicely described here.