I'm new in python threading and I'm experimenting this:
When I run something in threads (whenever I print outputs), it never seems to be running in parallel. Also, my functions take the same time that before using the library concurrent.futures (ThreadPoolExecutor).
I have to calculate the gains of some attributes over a dataset (I cannot use libraries). Since I have about 1024 attributes and the function was taking about a minute to execute (and I have to use it in a for iteration) I dicided to split the array of attributes
into 10 (just as an example) and run the separete function gain(attribute)
separetly for each sub array. So I did the following (avoiding some extra unnecessary code):
def calculate_gains(self):
splited_attributes = np.array_split(self.attributes, 10)
result = {}
for atts in splited_attributes:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(self.calculate_gains_helper, atts)
return_value = future.result()
self.gains = {**self.gains, **return_value}
Here's the calculate_gains_helper:
def calculate_gains_helper(self, attributes):
inter_result = {}
for attribute in attributes:
inter_result[attribute] = self.gain(attribute)
return inter_result
Am I doing something wrong? I read some other older posts but I couldn't get any info. Thanks a lot for any help!
I had this same trouble and fixed by moving the iteration to within the context of the ThreadPoolExecutor, or else, you'll have to wait for the context to finish and start another one.
Here is a probably fix for your code:
def calculate_gains(self):
splited_attributes = np.array_split(self.attributes, 10)
result = {}
with concurrent.futures.ThreadPoolExecutor() as executor:
for atts in splited_attributes:
future = executor.submit(self.calculate_gains_helper, atts)
return_value = future.result()
self.gains = {**self.gains, **return_value}
To demonstrate better what I mean here is a sample code:
Below is a non working code. Threads will execute synchronoulsly...
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep
def t(reference):
i = 0
for i in range(10):
print(f"{reference} :" + str(i))
i+=1
sleep(1)
futures = []
refs = ["a", "b", "c"]
for i in refs:
with ThreadPoolExecutor(max_workers=3) as executor:
futures.append(executor.submit(t, i))
for future in as_completed(futures):
print(future.result())
Here is the fixed code:
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep
def t(reference):
i = 0
for i in range(10):
print(f"{reference} :" + str(i))
i+=1
sleep(1)
futures = []
refs = ["a", "b", "c"]
with ThreadPoolExecutor(max_workers=3) as executor: #swapped
for i in refs: #swapped
futures.append(executor.submit(t, i))
for future in as_completed(futures):
print(future.result())
You can try this on your terminal and check out the outputs.