I have some code that makes two calls to an api for each item it processes. It works as intended, but it takes 1-3 seconds per item. In an attempt to speed it up, I tried to use the threading module to be performing 10 requests at a time, but it seems to be behaving the same way it was before I added the threading. The time it takes to process the data from the api is ~0.2 milliseconds per call, so that should not be causing the holdup.
Here is the relevant portion of my code:
import threading
.
.
.
def secondary():
global queue
item = queue.pop()
queue|=func1(item)# func1 returns data from an api using the requests module
with open('data/'+item,'w+') as f:
f.write(func2(item))# func2 also returns data from an api using the requests module
global num_procs
num_procs-=1
def primary():
t=[]# threads
global num_procs
num_procs+=min(len(queue),10-num_procs)
for i in range(min(len(queue),10-num_procs)):
t+=[threading.Thread(target=secondary)]
for i in t:
i.start()
i.join()
queue = {'initial_data'}
num_procs=0# number of currently running processes - when it reaches 10, stop creating new ones
while num_procs or len(queue):
primary()
What do I need to do to make it run concurrently? I would rather use threading, but if asynchronous is better, how do I implement that?
Immediately after starting each thread, you wait for the thread to finish:
for i in t:
i.start()
i.join()
The threads never get a chance to execute in parallel. Instead, only wait for the threads to finish after you've started all of them.