Search code examples
pythonmultithreadingmultiprocessingterminate

Python multiprocessing TypeError: join() takes exactly 1 argument (2 given)


I've made a lot of studying about multiprocessing! Basically I'm downloading data from an API and insert in a database.

I made a pool and access the function of download with a pool.imap, make a tuple with the results and insert all in one shot in DB.

I access this function repeatedly and at some point in time my process got hang! I've tried to follow https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map and access the join with timeout.

But pool.join(timeout) returns "TypeError: join() takes exactly 1 argument (2 given)". I suppose that the one argument is the default "self"?

A short chunk of the code:

timeout = 10
pool = Pool(10)
in_tuple = [x for x in pool.imap(multi_details,items) if x is not None]
pool.close()
pool.join(timeout) # from the documentation I should be able to put the timeout in join

writing_to_database(in_tuple)

# function that generate the content for DB
def multi_details(item):
        tuple = get_details(item)
        return tuple

I see different way to create processes and generate terminate() or join(timeout) but neither one is using imap/map - which are a much simpler to work in my case!


Solution

  • This is the solution!

    I didn't manage to use "next(timeout)" because it just parse a few items than stops before running entire list!

    I start to use apply_async. The only thing is that I have a strange feeling that it is slower than imap.

    The functional code is:

    timeout = 1
    pool = Pool(10)
    for x in items:
        try:
            res = pool.apply_async(multi_details,(x,)).get(timeout)
        except Exception as e:
            pass # you can put anything you want but my scope was to skip the things that took too much!
        else:
            if res is not None: # now this could be a better pythonic way to write this. Any help will be highly appreciated!
                in_tuple.append(res)
    pool.close()
    pool.join()
    

    Thank you and I hope it's useful!