I have a very long list data
, lets assume that it looks like this:
[(a, a, 1),
(b, b, 1),
(c, c, 1),
(d, d, 1),
(e, e, 1),
(f, f, 1),
(g, g, 1),
(h, h, 1),
(i, i, 1),]
I am trying to use multithreading as follows:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
pool.starmap(help_func, data)
Help_func is as follows:
def help_func(in_vala, in_valb, in_valc):
print("asking for " + str(in_vala) + " asking for " + str(in_valb))
receiver(in_vala)
and receiver is a simple test function as so:
def receiver(group):
print(group)
When I run my program, I can see that the output from help_func is correct, i.e., it enumerates the values of data
.
However, when I look at the values generated at the receiver(), I notice some weird prints which look like:
a
b
c
de
e
f
gh
i
I am struggling to see why this might be the case. There is something that goes wrong when calling receiver, perhaps due to receiver bring non-blocking may be?
How should I go around this issue.
Also, when I use ThreadPool(1), I do not see this issue. My actual problem has a much larger function that is called from help_func, so I would like to run it under multiple threads ideally.
You are encountering classical concurrency problem: everything you think is atomic is not. Actually print function prints two things: the data you pass to it and the end
argument, which by default is "\n"
.
So that concatenation is the result of one thread writing data, then other writing data, then both writing new lines.
It all much better explained in this Raymond Hettinger talk.
P.S.: I hope that you’re aware of python GIL. In short: only one python instruction can execute across all python threads at the same time. If you want to speed up execution of your function - use multiprocessing, multithreading is useful when your thread is blocking most of the time (for example, networking is mostly waiting for packets to arrive, so threads are ok for that)