I was trying various approaches with python multi-threading to see which one fits my requirements. To give an overview, I have a bunch of items that I need to send to an API. Then based on the response, some of the items will go to a database and all the items will be logged; e.g., for an item if the API returns success, that item will only be logged but when it returns failure, that item will be sent to database for future retry along with logging.
Now based on the API response I can separate out success items from failure and make a batch query with all failure items, which will improve my database performance. To do that, I am accumulating all requests at one place and trying to perform multithreaded API calls(since this is an IO bound task, I'm not even thinking about multiprocessing) but at the same time I need to keep track of which response belongs to which request.
Coming to the actual question, I tried two different approaches which I thought would give nearly identical performance, but there turned out to be a huge difference.
To simulate the API call, I created an API in my localhost with a 500ms sleep(for avg processing time). Please note that I want to start logging and inserting to database after all API calls are complete.
Approach - 1(With threading.Thread and queue.Queue())
import requests
import datetime
import threading
import queue
def target(data_q):
while not data_q.empty():
data_q.get()
response = requests.get("https://postman-echo.com/get?foo1=bar1&foo2=bar2")
print(response.status_code)
data_q.task_done()
if __name__ == "__main__":
data_q = queue.Queue()
for i in range(0, 20):
data_q.put(i)
start = datetime.datetime.now()
num_thread = 5
for _ in range(num_thread):
worker = threading.Thread(target=target(data_q))
worker.start()
data_q.join()
print('Time taken multi-threading: '+str(datetime.datetime.now() - start))
I tried with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken multi-threading: 0:00:06.625710
Time taken multi-threading: 0:00:13.326969
Time taken multi-threading: 0:00:26.435534
Time taken multi-threading: 0:00:40.737406
What shocked me here is, I tried the same without multi-threading and got almost same performance.
Then after some googling around, I was introduced to futures module.
Approach - 2(Using concurrent.futures)
def fetch_url(im_url):
try:
response = requests.get(im_url)
return response.status_code
except Exception as e:
traceback.print_exc()
if __name__ == "__main__":
data = []
for i in range(0, 20):
data.append(i)
start = datetime.datetime.now()
urls = ["https://postman-echo.com/get?foo1=bar1&foo2=bar2" + str(item) for item in data]
with futures.ThreadPoolExecutor(max_workers=5) as executor:
responses = executor.map(fetch_url, urls)
for ret in responses:
print(ret)
print('Time taken future concurrent: ' + str(datetime.datetime.now() - start))
Again with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken future concurrent: 0:00:01.276891
Time taken future concurrent: 0:00:02.635949
Time taken future concurrent: 0:00:05.073299
Time taken future concurrent: 0:00:07.296873
Now I've heard about asyncio, but I've not used it yet. I've also read that it gives even better performance than futures.ThreadPoolExecutor().
Final question, If both approaches are using threads(or so I think) then why there is a huge performance gap? Am I doing something terribly wrong? I looked around. Was not able to find a satisfying answer. Any thoughts on this would be highly appreciated. Thanks for going through the question.
[Edit 1]The whole thing is running on python 3.8. [Edit 2] Updated code examples and execution times. Now they should run on anyone's system.
The documentation of ThreadPoolExecutor
explains in detail how many threads are started when the max_workers
parameter is not given, as in your example. The behaviour is different depending on the exact Python version, but the number of tasks started is most probably more than 3, the number of threads in the first version using a queue. You should use futures.ThreadPoolExecutor(max_workers= 3)
to compare the two approaches.
To the updated Approach - 1 I suggest to modify the for loop a bit:
for _ in range(num_thread):
target_to_run= target(data_q)
print('target to run: {}'.format(target_to_run))
worker = threading.Thread(target= target_to_run)
worker.start()
The output will be like this:
200
...
200
200
target to run: None
target to run: None
target to run: None
target to run: None
target to run: None
Time taken multi-threading: 0:00:10.846368
The problem is that the Thread
constructor expects a callable object or None
as its target
. You do not give it a callable, rather queue processing happens on the first invocation of target(data_q)
by the main thread, and 5 threads are started that do nothing because their target
is None
.