I was wondering if there is any good reference (website or even better, a book) where I can find information about the internal implementation of the commonly used collections like
Dictionary<TKey, TValue>
List<T>
Queue<T>
Stack<T>
By internal implementation, I mean how they use a dynamic array to store their data, how often do they resize it, what is the time and space complexity for the common operations.
Of course, if anybody feels he can provide this information in this thread, you are more than welcome!
List<T>
:List<T>
has two properties, Capacity
and Count
that help clarify when the resizing takes place. Capacity
is the length of the internal array at any given time and Count
is the number of elements added in the List. If you have an estimate of the number of elements to be added in the list, Capacity
can be initialized (by choosing the appropriate constructor), which will lead to fewer or no resizings, and thus better performance.
Resizing (i.e. creation of a new larger array and copy of elements one by one to the new array) happens when Add<T>()
method is called and the array is already full (Count == Capacity
). The Capacity of the new array is doubled (initially, if not set by the user, it starts from 0, then 4 and then it gets doubled each time more space needed):
List<int> list = new List<int>();
//Capacity = 0, Count = 0
list.Add(52);
//Capacity = 4, Count = 1
list.Add(34);
list.Add(2);
list.Add(87);
//Capacity = 4, Count = 4
list.Add(56);
//Capacity = 8, Count = 5
For large n
, the time complexity for adding a new element is amortized constant O(1). The lookup by index is constant O(1), and the insertion or removal of an element at a given index is linear O(n) because it involves shifting the rest of elements one position (to the right or left, respectively). The space used for the internal array is of course linear to the elements of the array, varying from n
to 2n
(or, if this makes sense: Math.Pow(2, Math.Ceiling(Math.Log(n, 2)))
:).
Queue<T>
and Stack<T>
:The resizing of Queue's and Stack's internal array works similarly to what described for the List<T>
.The common operations are efficient O(1) (internally, indexes are kept for Queue's head and tail elements). So, enqueuing an element in the queue or pushing in the stack needs amortized constant time, dequeuing/poping needs constant time.
Dictionary<TKey, TValue>
:Dictionary works differently, and it is nicely described here.