Search code examples
algorithmdata-structurestime-complexityamortized-analysis

Why do dynamic arrays specifically double in size when running out of space?


I'm new to amortized analysis. I've noticed that a common practice for dynamic arrays is to double their sizes when running out of space. Is there a specific reason why we choose to double the size? Why not triple or quadruple? Is there a specific explanation for the choice of doubling using amortized analysis? Or is the choice arbitrary?


Solution

  • Growing the array size by scaling by any constant factor will be sufficient to get the runtime to be O(n). To see this, note that if the final array size ends up at n and we scale by a factor of m on each step, then the total work done growing the array will be

    1 + m + m2 + ... + m1+logm n.

    To see why this is, note that (if the array starts at size one) then the array will grow at sizes 1, m, m2, ..., until it reaches size n. That last growth step happens when mk = n, which happens when k = logm n. Factoring in one more growth step to overshoot n accounts for the +1 here.

    The above sum is a sum of a geometric series and sums to

    (m2 + logmn - 1) / (m - 1)

    = (m2n - 1)/ (m - 1)

    ≤ n · (m2 / (m - 1))

    So basically any exponent greater than one works, but the leading coefficient depends on what choice of m we pick. For large m, this coefficient is approximately equal to m, and we end up wasting a lot of effort and space growing the array. If m gets closer to one, the denominator gets bigger and bigger and more of a concern.

    Picking m = 2 gives a leading coefficient of 4, which is pretty low. Picking 1.5 gives a leading coefficient of 4.5. That’s higher, but not by much. However, picking 1.5 has some other advantages:

    • The allocated array, if we keep growing the array, is never more than 50% bigger than what we had before. That reduces the overhead of the data structure compared with doubling.
    • If we need to grow the array, the sum of the sizes of the previous arrays exceeds the size of the new array (check this - powers of two don’t do this). That makes it more likely that the memory allocator can recycle space from older discarded arrays to fit the new array.
    • Multiplying by 1.5 can be done by computing size + (size >> 1), which is extremely cheap on a processor compared with a multiply.

    Hope this helps!