python-3.x algorithm math time-complexity square-root

Time complexity of Python 3.8+'s integer square root `math.isqrt()` function

I am trying to analyze the time complexity of the default integer square root function in Python 3.8+ versions, math.isqrt()

I have tried to peruse the time complexity of this function as a function of n when calculating res = math.isqrt(n), where res is the result value that we want, and n is an integer.

I believe some of the complications may come from how Python does arithmetic as well, as well as being a dynamically-typed language, so there may need to be some rounding and buffer underflow/overflow that needs to be taken care of when trying to convert the float value back to int in subsequent rounds of calculation until a stopping condition is met with a threshold precision value.

If anyone has worked on the integer square root function math.isqrt for Python, I'd be happy to find your perspectives on the algorithmic complexity analysis for both time complexity as well as space complexity (which may be relevant if there are recursive elements or storing floating points after the decimal point for future calculations).

Solution

Quick answer: just give me the facts!

CPython 3.12 introduced some improvements to the speed of division of large integers. As a result, the running time of math.isqrt depends on which version of Python you're using:

For CPython < 3.12, the asymptotic time complexity of isqrt(n) is O((log n)^2): that is, it's quadratic time in the number of bits of n.
For CPython >= 3.12, it's subquadratic: O((log n)^1.585) (where that constant 1.585 is more properly log(3)/log(2)).
In either case, the space required to compute isqrt(n) is linear in the number of bits of n: O(log n).

Detailed answer: okay, now can I have an explanation, too?

The algorithm

For reference, here's the underlying algorithm expressed in Python (leaving out some irrelevancies like type conversion and error checking):

def isqrt(n: int) -> int:
    """Find the integer part of the square root of a positive integer n."""
    c = (n.bit_length() - 1) // 2
    a = 1
    for s in reversed(range(c.bit_length())):
        d, e = c >> s, c >> s + 1
        a = (a << d - e - 1) + (n >> 2*c - e - d + 1) // a
        # at this point a approximates the square root of n >> (2*c - 2*d)
    return a - (a*a > n)

An aside: while this is the form of the algorithm that we want to use for complexity analysis (since it closely matches the actual implementation), the algorithm is rather opaque in this form, so it's not particularly helpful for understanding why it works. The integer square root algorithm is more naturally expressed recursively, something like this:

def nsqrt(n: int) -> int:
    """For positive n, return a satisfying (a - 1)**2 < n < (a + 1)**2."""
    if n < 4:
        return 1

    t = (n.bit_length() + 1) // 4
    a = nsqrt(n >> 2 * t) << t
    return (a + n // a) // 2

def isqrt(n: int) -> int:
    """Find the integer part of the square root of a positive integer n."""
    a = nsqrt(n)
    return a - (a * a > n)

That is, the core of the algorithm is a recursive method for finding what I like to call a "near square root" of a positive integer n: a value a satisfying (a - 1)**2 < n < (a + 1)**2, or equivalently, |a - √n| < 1. That near-square-root algorithm uses the fact that if a is a near square root of n >> 2*t for some nonnegative integer t, then a << t is an approximation to the real square root √n. If we apply a single Newton-Raphson iteration to that approximation, we get a better approximation to √n, and if t is chosen carefully then that better approximation is again a near square root of n.

I leave it to you to check that the two forms of the algorithm are essentially equivalent.

Complexity analysis

What makes the complexity analysis relatively easy is that we only have to worry about the last iteration of the for loop. That's because the algorithm progressively increases the working precision, roughly doubling it at each iteration. Given that we're expecting at least linear time (in the number of bits) for each iteration, the last-but-one iteration takes at most half the time of the last iteration, and the last-but-two iteration takes at most a quarter of the time of the last iteration, and so on. So by the magic of geometric progressions, the total accumulated time is at most double the time of the last iteration (plus some O(log n) and O(log log n) book-keeping overhead, which doesn't affect the final complexity).

Now within each iteration, there are only four operations that are working at the level of big integers, and they're all in that a = ... update line: a left shift, a right shift, a division, and an addition. The left shift, right shift and addition all have time linear in the number of bits involved, so the division dominates. (The integers e, d and c all have size of the order of magnitude of log2(n), so the operations involved in computing d - e - 1 or 2 * c - e - d + 1 don't contribute significantly to the time complexity. In the actual C code implementation of math.isqrt, the distinction is clearer, since c, d and e are all native C integers, while n and a are full-blown Python multiprecision integer objects.)

One slight subtlety with the right shift: note that the time taken grows with the size of the result, not the size of n. That is, getting the top t bits of a b-bit number n takes time proportional to t, not b. That fact's important in establishing the claim that the time taken at least doubles for each iteration.

So for everything up to the end of the for loop, the time complexity is the same as the time complexity of the division in the final iteration. If you push through the numbers, you can predict the sizes of the integers involved in the division: if n has b bits (that is, n.bit_length() == b), then that division is a division of an integer with roughly 3b/4 bits by an integer with roughly b/4 bits.

The only other thing that might be significant is the final squaring multiplication, which is a multiplication of a roughly b/2-bit integer by itself. It's possible that a big integer arithmetic system could have a division that's asymptotically more efficient than the multiplication, but it's unlikely: in general, division should be expected to have complexity at least that of multiplication. (Fast division algorithms usually involve a multiplication at some point, and Python's is no exception.)

So to find the complexity, all we have to care about is two operations: the final division, and the multiplication in the correction step.

To CPython specifics: prior to CPython 3.12, multiplication used the Karatsuba algorithm (beyond some cutoff point), and division was the traditional schoolbook long division method, taking time quadratic in its inputs. So isqrt was dominated by the division, and had complexity O((log n)^2). From CPython 3.12 onwards, multiplication has the same complexity as before, but division now has the same asymptotic complexity as multiplication, so isqrt has complexity matching that of both the multiplication and division algorithms: O(n^1.585). See https://github.com/python/cpython/pull/96673 for the gory details of the Python 3.12 changes.

For the space requirements: the number of auxiliary variables is constant, and their size is bounded by the size of n. So the space requirement is O(log n).

Interestingly, Juraj Sukop came up with an improvement to the isqrt algorithm that (at least for CPython < 3.12) doubles its speed for large n. Discussion is here: https://github.com/python/cpython/issues/87219. It's not implemented for math.isqrt because the (fairly small) extra complexity involved has an adverse affect for smaller inputs.