Search code examples
pythonmultithreadingconcurrent.futures

Make multiple socket-connections in each own thread in python, stuck after 3 connections?


I have a dict of objects containing username/hostinformation for connecting to a server and want each connection to be run in it's own thread, but seem to get at most 3 out of my 100 test-objects to connect properly no matter if i try to use concurrent.futures or the threading module. I understand that it has something to do with the current thread executing before the previous threads are finished, because i managed to get from only being able to connect one user to 3 users by sleeping 2 seconds between each connection, but this is of course not a working solution to handle manually, plus i noted that increasing the sleeptime to 5 or more seconds still only connected 3 users, which leds me to think maybe there is another issue than this too?

I have been playing around with wait-keyword for executor.shutdown without really understanding it but with no luck, read somewhere that the problem might be related to GIL but according to other sources this is nothing that affects what I'm trying to do, personally I have no idea at all about any way to get around that except for running multiprocessing instead of threads.

number_of_objs = 100

def run_(obj):
     obj.connect()

objs = {}

for i in range(number_of_objs):
     nickname = f"{nickname_prefix}_{i}"
     hostname = "server.example.com"
     port = 1234
     objs[i] = CreateObjs(nickname, hostname, port)

with concurrent.futures.ThreadPoolExecutor() as executor:
    for k in objs.keys():
        time.sleep(2)
        executor.submit(run_, objs[k])

# Result:
# Testuser_0 successfully connected to server.example.com at port 1234
# Testuser_1 successfully connected to server.example.com at port 1234
# Testuser_2 successfully connected to server.example.com at port 1234

Solution

  • If I guess correctly your intentions, ThreadPoolExecutor is not the right construct here. Let's first test the following non-concurrent code:

    for k in objs.keys():
        run_(objs[k])
    

    This code should make the connections sequentially, and the next one should start when the previous have ended. If you want to use ThreadPoolExecutor, this code should work.

    What ThreadPoolExecutor actually does is that it runs a number of tasks concurrently, and if the number of tasks is greater than the amount of workers, the rest of the tasks are queued to wait for the previous tasks to end. The issue with GIL is that ThreadPoolExecutor does not function as expected if the tasks contain only computing, as they cannot be run in a true multithreaded fashion due to GIL. Now, if the tasks in run_ start waiting for the remote server, start sending test date or something similar, ThreadPoolExecutor does not work as the first tasks never end.

    If the wish is to start 100 parallel threads to do some I/O, this should be used:

    for k in objs.keys():
        threading.Thread(target=run_, args=(objs[k],)).start()
    

    Note: you cannot start arbitrary number of threads like this. 100 is fine, as is probably 1000, but there are practical limits to the number of active threads.