In introduction to Algorithms there is the following question
Binary search of a sorted array takes logarithmic search time, but the time to insert a new element is linear in the size of the array. We can improve the time for insertion by keeping several sorted arrays.
Specifically, suppose that we wish to support SEARCH and INSERT on a set of n elements. Let k equal the ceiling of lg(n + 1) , and let the binary representation of n be (nk-1, nk-2 ... n0). We have k sorted arrays (A0, A1, Ak-1) where for i = (0, 1, ... k-1), the length of array Ai is 2i. Each array is either full or empty, depending on whether ni = 1 or ni = 0, respectively. The total number of elements held in all k arrays is therefore the sum of ni * 2i from 0 to k-1 which equals n. Although each individual array is sorted, elements in different arrays bear no particular relationship to each other.
I am struggling to understand the data structure it is describing. Specifically I am confused by the line "and let the binary representation of n be (nk-1, nk-2 ... n0)" I understand what binary representation is, however, I am confused about "(nk-1, nk-2 ... n0)". Isn't n already defined? Are there k-1 variants of n? or is n limited by what k is? Are we just looking at k-1 digits of the binary?
What does this line mean?
Also how does it relate to "Each array is either full or empty, depending on whether ni = 1 or ni = 0"?
For example if I had the Array A = {0, 1, 2, 3, 4, 5, 6}
I understand that k = 3 and therefore I have 3 arrays A0, A1, A2 with the lengths of 1, 2, 4. However I do not understand what the content of these arrays would be.
What would the content be for arrays A0, A1, A2 be?
The n items are distributed arbitrarily among the full arrays. The only constraint is that the number of items in array i, if it is full, is 2i.
I'm sure (or at least I hope!) that there is lots of text you didn't paste in, although the text you quoted is enough to figure out what's going on.
You have to think about what happens to the binary representation of n as you add items.
Lets say you have 11 items, so n=11. The binary representation of 11 is 1011
, so A0, A1, and A3 are full, and A2 is empty. The number of items in each full array is equal to its binary place value, so the total number of items is 11.
Now when we insert an item, n will change to 12. The binary representation is now 1100
. A0 and A1 are therefore emptied out, and A2 is filled with, of course, the elements we emptied out of A0 and A1, as well as the new item.
Every time you insert, there may be some full arrays that must be emptied, and there is one empty array that must be filled with their items and the new one, in sorted order. This takes time proportional to the number of items moved.
The amortized cost of insertion is constant for every array, so the total amortized cost of insertion is O(log N).
The cost of search is O(log2 N), because you have to search in all the full arrays.