Search code examples
pythonconcurrencygeventeventlet

How to efficiently do many tasks a "little later" in Python?


I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).

What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.


Solution

  • consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.

    How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.

    The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.

    The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.

    import heapq, time, threading
    
    START_TIME = time.time()
    SERIALIZE_STDOUT = threading.Lock()
    def consumer(message):
        """the actual work function.  nevermind the locks here, this just keeps
           the output nicely formatted.  a real work function probably won't need
           it, or might need quite different synchronization"""
        SERIALIZE_STDOUT.acquire()
        print time.time() - START_TIME, message
        SERIALIZE_STDOUT.release()
    
    def produce(work_queue, condition, timeout, message):
        """called to put a single item onto the work queue."""
        prio = time.time() + float(timeout)
        condition.acquire()
        heapq.heappush(work_queue, (prio, message))
        condition.notify()
        condition.release()
    
    def worker(work_queue, condition):
        condition.acquire()
        stopped = False
        while not stopped:
            now = time.time()
            if work_queue:
                prio, data = work_queue[0]
                if data == 'stop':
                    stopped = True
                    continue
                if prio < now:
                    heapq.heappop(work_queue)
                    condition.release()
                    # do some work!
                    consumer(data)
                    condition.acquire()
                else:
                    condition.wait(prio - now)
            else:
                # the queue is empty, wait until notified
                condition.wait()
        condition.release()
    
    if __name__ == '__main__':
        # first set up the work queue and worker pool
        work_queue = []
        cond = threading.Condition()
        pool = [threading.Thread(target=worker, args=(work_queue, cond))
                for _ignored in range(4)]
        map(threading.Thread.start, pool)
    
        # now add some work
        produce(work_queue, cond, 10, 'Grumpy')
        produce(work_queue, cond, 10, 'Sneezy')
        produce(work_queue, cond, 5, 'Happy')
        produce(work_queue, cond, 10, 'Dopey')
        produce(work_queue, cond, 15, 'Bashful')
        time.sleep(5)
        produce(work_queue, cond, 5, 'Sleepy')
        produce(work_queue, cond, 10, 'Doc')
    
        # and just to make the example a bit more friendly, tell the threads to stop after all
        # the work is done
        produce(work_queue, cond, float('inf'), 'stop')
        map(threading.Thread.join, pool)