Multiprocessing threadpool concatenates arguments

I have a very long list data, lets assume that it looks like this:

[(a, a, 1),
(b, b, 1),
(c, c, 1),
(d, d, 1),
(e, e, 1),
(f, f, 1),
(g, g, 1),
(h, h, 1),
(i, i, 1),]

I am trying to use multithreading as follows:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
pool.starmap(help_func, data)

Help_func is as follows:

def help_func(in_vala, in_valb, in_valc):
    print("asking for " + str(in_vala) + " asking for " + str(in_valb))
    receiver(in_vala)

and receiver is a simple test function as so:

def receiver(group):
    print(group)

When I run my program, I can see that the output from help_func is correct, i.e., it enumerates the values of data.

However, when I look at the values generated at the receiver(), I notice some weird prints which look like:

a
b
c
de
e
f
gh
i

I am struggling to see why this might be the case. There is something that goes wrong when calling receiver, perhaps due to receiver bring non-blocking may be?

How should I go around this issue.

Also, when I use ThreadPool(1), I do not see this issue. My actual problem has a much larger function that is called from help_func, so I would like to run it under multiple threads ideally.

Solution

You are encountering classical concurrency problem: everything you think is atomic is not. Actually print function prints two things: the data you pass to it and the end argument, which by default is "\n".

So that concatenation is the result of one thread writing data, then other writing data, then both writing new lines.

It all much better explained in this Raymond Hettinger talk.

P.S.: I hope that you’re aware of python GIL. In short: only one python instruction can execute across all python threads at the same time. If you want to speed up execution of your function - use multiprocessing, multithreading is useful when your thread is blocking most of the time (for example, networking is mostly waiting for packets to arrive, so threads are ok for that)