I'm new to amortized analysis. I've noticed that a common practice for dynamic arrays is to double their sizes when running out of space. Is there a specific reason why we choose to double the size? Why not triple or quadruple? Is there a specific explanation for the choice of doubling using amortized analysis? Or is the choice arbitrary?
Growing the array size by scaling by any constant factor will be sufficient to get the runtime to be O(n). To see this, note that if the final array size ends up at n and we scale by a factor of m on each step, then the total work done growing the array will be
1 + m + m2 + ... + m1+logm n.
To see why this is, note that (if the array starts at size one) then the array will grow at sizes 1, m, m2, ..., until it reaches size n. That last growth step happens when mk = n, which happens when k = logm n. Factoring in one more growth step to overshoot n accounts for the +1 here.
The above sum is a sum of a geometric series and sums to
(m2 + logmn - 1) / (m - 1)
= (m2n - 1)/ (m - 1)
≤ n · (m2 / (m - 1))
So basically any exponent greater than one works, but the leading coefficient depends on what choice of m we pick. For large m, this coefficient is approximately equal to m, and we end up wasting a lot of effort and space growing the array. If m gets closer to one, the denominator gets bigger and bigger and more of a concern.
Picking m = 2 gives a leading coefficient of 4, which is pretty low. Picking 1.5 gives a leading coefficient of 4.5. That’s higher, but not by much. However, picking 1.5 has some other advantages:
size + (size >> 1)
, which is extremely cheap on a processor compared with a multiply.Hope this helps!