I saw the following code in a thread tutorial:
from time import sleep, perf_counter
from threading import Thread
start = perf_counter()
def foo():
sleep(5)
threads = []
for i in range(100):
t = Thread(target=foo,)
t.start()
threads.append(t)
for i in threads:
i.join()
end = perf_counter()
print(f'Took {end - start}')
When I run it it prints Took 5.014557975
. Okay, that part is fine. It does not take 500 seconds as the non threaded version would.
What I don't understand is how .join
works. I noticed without calling .join
I got Took 0.007060926999999995
which indicates that the main thread ended before the child threads. Since '.join()' is supposed to block, when the first iteration of the loop occurs won't it be blocked and have to wait 5 seconds till the second iteration? How does it still manage to run?
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
The OS is in control when the thread starts and the OS will context-switch (I believe that is the correct term) between threads.
time
functions access a clock on your computer via the OS - that clock is always running. As long as the OS periodically gives each thread time to access a clock the thread's target can tell if it has been sleeping long enough.
The threads are not running in parallel, the OS periodically gives each one a chance to look at the clock.
Here is a little finer detail for what is happening. I subclassed Thread and overrode its run
and join
methods to log when they are called.
Caveat The documentation specifically states
only override
__init__
andrun
methods
I was surprised overriding join
didn't cause problems.
from time import sleep, perf_counter
from threading import Thread
import pandas as pd
c = {}
def foo(i):
c[i]['foo start'] = perf_counter() - start
sleep(5)
# print(f'{i} - start:{start} end:{perf_counter()}')
c[i]['foo end'] = perf_counter() - start
class Test(Thread):
def __init__(self,*args,**kwargs):
self.i = kwargs['args'][0]
super().__init__(*args,**kwargs)
def run(self):
# print(f'{self.i} - started:{perf_counter()}')
c[self.i]['thread start'] = perf_counter() - start
super().run()
def join(self):
# print(f'{self.i} - joined:{perf_counter()}')
c[self.i]['thread joined'] = perf_counter() - start
super().join()
threads = []
start = perf_counter()
for i in range(10):
c[i] = {}
t = Test(target=foo,args=(i,))
t.start()
threads.append(t)
for i in threads:
i.join()
df = pd.DataFrame(c)
print(df)
0 1 2 3 4 5 6 7 8 9
thread start 0.000729 0.000928 0.001085 0.001245 0.001400 0.001568 0.001730 0.001885 0.002056 0.002215
foo start 0.000732 0.000931 0.001088 0.001248 0.001402 0.001570 0.001732 0.001891 0.002058 0.002217
thread joined 0.002228 5.008274 5.008300 5.008305 5.008323 5.008327 5.008330 5.008333 5.008336 5.008339
foo end 5.008124 5.007982 5.007615 5.007829 5.007672 5.007899 5.007724 5.007758 5.008051 5.007549
Hopefully you can see that all the threads are started in sequence very close together; once thread 0 is joined nothing else happens till it stops (foo
ends) then each of the other threads are joined and terminate.
Sometimes a thread terminates before it is even joined - for threads one plus foo ends
before the thread is joined.