I am preparing a session for my team on threading and locking in Python and I've run into a situation I don't fully understand.
In the code below, I'm computing hashes of many (1000) large strings (100k+ chars) using Thread pool in Python (pool of 20). The hashes are then stored in the dictionary digest
, and so I use locking when writing to the dictionary (I think that may not actually be necessary - but let's suppose it is and we need to lock).
The version A) does the expensive hash computation within the lock statement, version B) does it before acquiring the lock, and then only updates the dictionary with the result in the critical section.
import threading
import time
from multiprocessing.pool import ThreadPool
import hashlib
# A) computation is within the lock statement
lock = threading.Lock()
digests = {}
def compute_digests(x):
s = '*'*(x + 100000) # generate some big string
with lock:
digests[x] = hashlib.sha256(f'{s}'.encode()).hexdigest()
tic = time.time()
ThreadPool(20).map(compute_digests, range(1000))
toc = time.time()
print(f'Computation in locked area: {toc - tic}s')
# B) computation is outside of the lock statement
lock = threading.Lock()
digests = {}
def compute_digests(x):
s = '*'*(x + 100000) # generate some big string
digest = hashlib.sha256(f'{s}'.encode()).hexdigest()
with lock:
digests[x] = digest
tic = time.time()
ThreadPool(20).map(compute_digests, range(1000))
toc = time.time()
print(f'Computation outside of locked area: {toc - tic}s')
The results are:
Computation in locked area: 0.41937875747680664s
Computation outside of locked area: 0.10702204704284668s
In other words, option B) is faster. That may seem intuitive given that we moved the expensive computation outside of the locked block of code, however, based on what I read, Python is single threaded anyway and the ThreadPool
only gives appearance of doing work in parallel - whereas in reality only one computation runs at any moment. In other words, I would expect the Global Interpreter Lock to be the bottleneck, but somehow, there is a substantial speedup with version B)!
So the question is, where is that speedup coming from? Is that something to do with the implementation of sha256 (that perhaps sleeps somewhere)?
Python is not single threaded. It uses normal system threads as any C++ or Java code would. The difference is the global interpreter lock (GIL) which can be released by inner C code such as hashlib, while running pure Python code forces a single thread to execute at a time.
In this case, the interpreter would be free to run different code but you forced it not to with the lock.