Let's say I have a list like this:
list_base = ['a','b','c','d']
If I used for xxx in list_base:
, the loop would parse the list one value at a time. If I want to double the speed of this work, I'm creating a list with two values to iterate over at once and calling multiprocessing
.
Basic example
Code 1 (main_code.py
):
import api_values
if __name__ == '__main__':
list_base = ['a','b','c','d']
api_values.main(list_base)
Code 2 (api_values.py
):
import multiprocessing
import datetime
def add_hour(x):
return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M')
def main(list_base):
a = list_base
a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
if (len(a) % 2) != 0:
a_pairs.append([a[-1]])
final_list = []
for a, b in a_pairs:
mp_1 = multiprocessing.Process(target=add_hour, args=(a,))
mp_2 = multiprocessing.Process(target=add_hour, args=(b,))
mp_1.start()
mp_2.start()
mp_1.join()
mp_2.join()
final_list.append(mp_1)
final_list.append(mp_2)
print(final_list)
When I analyze the final_list
print it delivers values like this:
[
<Process name='Process-1' pid=9564 parent=19136 stopped exitcode=0>,
<Process name='Process-2' pid=5400 parent=19136 stopped exitcode=0>,
<Process name='Process-3' pid=13396 parent=19136 stopped exitcode=0>,
<Process name='Process-4' pid=5132 parent=19136 stopped exitcode=0>
]
I couldn't get to the return values I want conquered by calling the add_hour(x)
function.
I found some answers in this question:
How can I recover the return value of a function passed to multiprocessing.Process?
But I couldn't bring to the scenario I'm using where I need the multiprocessing
inside a function and not inside if __name__ == '__main__':
When trying to use it, it always generates errors in relation to the position of the created code structure, I would like some help to be able to visualize the use for my need.
Note:
This codes are a basic's examples, my real use is to extract data from an API that allows for a maximum of two simultaneous calls.
Additional code:
According to @Timus comment (You might want to look into a **Pool** and **.apply_async**
), I came to this code it seems to me it worked but I don't know if it is reliable, if there is any improvement that is necessary for its use and this option is the best, feel free to update in a answer:
import multiprocessing
import datetime
final_list = []
def foo_pool(x):
return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M:%S')
def log_result(result):
final_list.append(result)
def main(list_base):
pool = multiprocessing.Pool()
a = list_base
a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
if (len(a) % 2) != 0:
a_pairs.append([a[-1]])
for a, b in a_pairs:
pool.apply_async(foo_pool, args = (a, ), callback = log_result)
pool.apply_async(foo_pool, args = (b, ), callback = log_result)
pool.close()
pool.join()
print(final_list)
You don't have to use a callback: Pool.apply_async()
gives you a return (an AsyncResult
object) which has a .get()
method to retrieve the result of the submit. Extension of your attempt:
import time
import multiprocessing
import datetime
from os import getpid
def foo_pool(x):
print(getpid())
time.sleep(2)
return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M:%S')
def main(list_base):
a = list_base
a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
if (len(a) % 2) != 0:
a_pairs.append([a[-1]])
final_list = []
with multiprocessing.Pool(processes=2) as pool:
for a, b in a_pairs:
res_1 = pool.apply_async(foo_pool, args=(a,))
res_2 = pool.apply_async(foo_pool, args=(b,))
final_list.extend([res_1.get(), res_2.get()])
print(final_list)
if __name__ == '__main__':
list_base = ['a','b','c','d']
start = time.perf_counter()
main(list_base)
end = time.perf_counter()
print(end - start)
I have added the print(getpid())
to foo_pool
to show that you're actually using different processes. And I've used time
to illustrate that despite the time.sleep(2)
in foo_pool
the overall duration of main
isn't much more than 2 seconds.