Search code examples
pythonparallel-processingqueuegevent

How to process multiple tasks concurrently while maintaining order on the input & output?


Currently, there is a Python project using gevent to submit tasks that execute socket calls to one of our computing resources. The python program generates the requests of almost 1000 objects and executes them concurrently. When it receives the request (fixed width string stream), it writes the output straight into a file and appending to this file as the task results come in. This keep memory overhead low and keeps moving things as fast as possible.

Now, as with all projects, a new requirement has been introduced. The Python solution needs to sort the data in the file. What complicates this is the fact that the output file is fixed width and slicing it up/sorting within Python would be too much throw away work.

Is there a pattern where gevent can execute a list of tasks in parallel but, somehow, keep the results in the same order it was submitted in the list? I do have to keep in mind that the results that come back are sizable and I'm trying to keep memory requirements as low as possible.


Solution

  • This is a very simple approach, but might work within the constraints you've outlined. However, it does not make use of gevent directly.

    Write the output of each task to a temporary file named according to the id (order) of the task. When all tasks are complete, read the files in order and append the contents of each to the final output file. Thus, the contents of only one task's output is in memory at any given time.