Search code examples
pythonpython-asyncio

parallelizing for loop in asyncio


I currently have a for loop as follows

async def process(tasks, num):
      count = 0
      results = []
      for task in tasks:
           if count >= num:
               break
           result = await some_async_task(task)
           if result == 'foo':
               continue
           results.append(result)
           count+=1

I was wondering if I can use gather or wait_for primitive here. But I am not sure how to implement these if logics in there? Like.. I dont want to unneccessary await a task if count>=num. If there are 20 tasks and num = 4, then I dont want to run all 20 tasks.


Solution

  • You could process the tasks in batches of size that equals the number of results you still need. If you give such batch to asyncio.gather(), it will both run them in parallel and preserve the order of results. For example:

    async def process(tasks, num):
        results = []
        task_iter = iter(tasks)
        while len(results) < num:
            next_batch = tuple(itertools.islice(task_iter, num - len(results)))
            if len(next_batch) == 0:
                break
            batch_results = await asyncio.gather(*next_batch)
            results.extend(r for r in batch_results if r == 'foo')