Python: how so fast?

The period of the Mersenne Twister used in the module random is (I am told) 2**19937 - 1. As a binary number, that is 19937 '1's in a row (if I'm not mistaken). Python converts it to decimal pretty darned fast:

$ python -m timeit '2**19937'
10000000 loops, best of 3: 0.0271 usec per loop

$ python -m timeit -s 'result = 0' 'result += 2**19937'
100000 loops, best of 3: 2.09 usec per loop

I guess the second version is the one that requires conversion?

And it's not just binary. This is also fast. (Rather than show the numbers, I show the length of the decimal converted to a string):

>>> import math
>>> N = 1000
>>> s = str((int(N*math.e))**(int(N*math.pi)))
>>> len(s)
10787
>>> N = 5000
>>> s = str((int(N*math.e))**(int(N*math.pi)))
>>> len(s)
64921

Timing:

python -m timeit -s 'import math' -s 'N=1000' 's = str((int(N*math.e))**(int(N*math.pi)))'
10 loops, best of 3: 51.2 msec per loop

The question is: how is this actually done?

Am I just naive to be impressed? I find the sight of the Python shell generating a number of 5000 or so places in an instant truly spectacular.

Edit:

Additional timings suggested by @dalke and @truppo

$ python -m timeit 'x=2' 'x**19937'
1000 loops, best of 3: 230 usec per loop
$ python -m timeit 'x=2' 'int(x**19937)'
1000 loops, best of 3: 232 usec per loop
$ python -m timeit 'x=2' 'str(x**19937)'
100 loops, best of 3: 16.6 msec per loop

$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937'
1000 loops, best of 3: 237 usec per loop
$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937' 'int(result)'
1000 loops, best of 3: 238 usec per loop
$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937' 'str(result)'
100 loops, best of 3: 16.6 msec per loop

So it looks to me like result = 0; result += 2**19937 probably does force the conversion.

Solution

Python converts it to decimal pretty darned fast.

I don't know Python, but no, it needn't to do that. 2^19937 don't need computations, it is simply a binary shift ("<<") with 19937, so it is very fast. Only if you print that in decimal the actual conversion is necessary and that is much slower.

EDIT: Exponentiation can be the same as shifting (=moving the point) if the number base is identical to the exponent base.

10^-1 = 0.1 10^0 = 1
10^1 = 10
10^2 = 100
10^3 = 1000
10^n = 1 (n zeroes)

You see that exponentiation of 10 with the exponent n simply shifts the number. Now computers mostly use the internal base 2 (bits) , so calculating 2^19937 is setting a bit in position 19937 (counting bit positions from zero).
As additional information: The conversion to decimal is normally implemented by a conquer-and-divide algorithm which successively divides the number by powers of ten. As you see, the actual conversion is slower by a factor of half a million.

The second example is more interesting: As you are computing m^n with large integers m,n the fastest way is multiplying it in succession and store the temporary results.

Example: 10^345

a = 10^2
b = aa = 10^4
c = bb = 10^16
d = c*c = 10^256

result = dccccccccbb*10

(Can be further optimized: see Knuth, Seminumerical Algorithms)

So you only need long multiplications and they can be computed pretty effectively.

EDIT: The exact implementation of multiplication depends: Besides the normal school multiplication Karatsuba, Tom-Cooke and Schoenhagen-Strasse (FFT) multiplication is used.