I've made a lot of studying about multiprocessing! Basically I'm downloading data from an API and insert in a database.
I made a pool and access the function of download with a pool.imap, make a tuple with the results and insert all in one shot in DB.
I access this function repeatedly and at some point in time my process got hang! I've tried to follow https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map and access the join with timeout.
But pool.join(timeout) returns "TypeError: join() takes exactly 1 argument (2 given)". I suppose that the one argument is the default "self"?
A short chunk of the code:
timeout = 10
pool = Pool(10)
in_tuple = [x for x in pool.imap(multi_details,items) if x is not None]
pool.close()
pool.join(timeout) # from the documentation I should be able to put the timeout in join
writing_to_database(in_tuple)
# function that generate the content for DB
def multi_details(item):
tuple = get_details(item)
return tuple
I see different way to create processes and generate terminate() or join(timeout) but neither one is using imap/map - which are a much simpler to work in my case!
This is the solution!
I didn't manage to use "next(timeout)" because it just parse a few items than stops before running entire list!
I start to use apply_async. The only thing is that I have a strange feeling that it is slower than imap.
The functional code is:
timeout = 1
pool = Pool(10)
for x in items:
try:
res = pool.apply_async(multi_details,(x,)).get(timeout)
except Exception as e:
pass # you can put anything you want but my scope was to skip the things that took too much!
else:
if res is not None: # now this could be a better pythonic way to write this. Any help will be highly appreciated!
in_tuple.append(res)
pool.close()
pool.join()
Thank you and I hope it's useful!