python multithreading blocking factorial gil

Why does Python's math.factorial not play nice with threads?

Why does math.factorial act so weird in a thread?

Here is an example, it creates three threads:

thread that just sleeps for a while
thread that increments an int for a while
thread that does math.factorial on a large number.

It calls start on the threads, then join with a timeout

The sleep and spin threads work as expected and return from start right away, and then sit in the join for the timeout.

The factorial thread on the other hand does not return from start until it runs to the end!

import sys
from threading import Thread
from time import sleep, time
from math import factorial

# Helper class that stores a start time to compare to
class timed_thread(Thread):
    def __init__(self, time_start):
        Thread.__init__(self)
        self.time_start = time_start

# Thread that just executes sleep()
class sleep_thread(timed_thread):
    def run(self):
        sleep(15)
        print "st DONE:\t%f" % (time() - time_start)

# Thread that increments a number for a while       
class spin_thread(timed_thread):
    def run(self):
        x = 1
        while x < 120000000:
            x += 1
        print "sp DONE:\t%f" % (time() - time_start)

# Thread that calls math.factorial with a large number
class factorial_thread(timed_thread):
    def run(self):
        factorial(50000)
        print "ft DONE:\t%f" % (time() - time_start)

# the tests

print
print "sleep_thread test"
time_start = time()

st = sleep_thread(time_start)
st.start()
print "st.start:\t%f" % (time() - time_start)
st.join(2)
print "st.join:\t%f" % (time() - time_start)
print "sleep alive:\t%r" % st.isAlive()


print
print "spin_thread test"
time_start = time()

sp = spin_thread(time_start)
sp.start()
print "sp.start:\t%f" % (time() - time_start)
sp.join(2)
print "sp.join:\t%f" % (time() - time_start)
print "sp alive:\t%r" % sp.isAlive()

print
print "factorial_thread test"
time_start = time()

ft = factorial_thread(time_start)
ft.start()
print "ft.start:\t%f" % (time() - time_start)
ft.join(2)
print "ft.join:\t%f" % (time() - time_start)
print "ft alive:\t%r" % ft.isAlive()

And here is the output on Python 2.6.5 on CentOS x64:

sleep_thread test
st.start:       0.000675
st.join:        2.006963
sleep alive:    True

spin_thread test
sp.start:       0.000595
sp.join:        2.010066
sp alive:       True

factorial_thread test
ft DONE:        4.475453
ft.start:       4.475589
ft.join:        4.475615
ft alive:       False
st DONE:        10.994519
sp DONE:        12.054668

I've tried this on python 2.6.5 on CentOS x64, 2.7.2 on Windows x86 and the factorial thread does not return from start on either of them until the thread is done executing.

I've also tried this with PyPy 1.8.0 on Windows x86, and there result is slightly different. The start does return immediately, but then the join doesn't time out!

sleep_thread test
st.start:       0.001000
st.join:        2.001000
sleep alive:    True

spin_thread test
sp.start:       0.000000
sp DONE:        0.197000
sp.join:        0.236000
sp alive:       False

factorial_thread test
ft.start:       0.032000
ft DONE:        9.011000
ft.join:        9.012000
ft alive:       False
st DONE:        12.763000

Tried IronPython 2.7.1 too, it produces the expected result.

sleep_thread test
st.start:       0.023003
st.join:        2.028122
sleep alive:    True

spin_thread test
sp.start:       0.003014
sp.join:        2.003128
sp alive:       True

factorial_thread test
ft.start:       0.002991
ft.join:        2.004105
ft alive:       True
ft DONE:        5.199295
sp DONE:        5.734322
st DONE:        10.998619

Solution

Threads often only allow different things to be interleaved in Python, not different things to happen at the same time, because of the Global Interpreter Lock.

If you look at the Python bytecode:

from math import factorial

def fac_test(x):
    factorial(x)

import dis
dis.dis(fac_test)

you get:

  4           0 LOAD_GLOBAL              0 (factorial)
              3 LOAD_FAST                0 (x)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

As you can see, the call to math.factorial is a single operation at the Python bytecode level (6 CALL_FUNCTION) -- it's implemented in C. factorial doesn't release the GIL because of the type of work it does (see the comments on my answer), so Python doesn't switch to other threads while it's running, and you get the result you've observed.