Search code examples
pythongitconcurrencygitpythonconcurrent.futures

GitPython causes concurrent.futures.ThreadPoolExecutor to ignore max_workers


I am writing some Python code to perform operations on a large number of git repositories in parallel. To do this I am trying to combine concurrent.futures and GitPython, cloning each repository in a separate future task. This is using the built-in Python 2.7.6 on OS X 10.10 and with GitPython 0.3.5 and futures 2.2.0 (the version back-ported to 2.7) both installed via pip.

A simple example of the code I'm using is as follows:

import time
from concurrent import futures
import shutil
import os
from git import Repo


def wait_then_return(i):
    print('called: %s', i)
    time.sleep(2)
    return i


def clone_then_return(i):
    print('called: %s', i)
    path = os.path.join('/tmp', str(i))
    os.mkdir(path)
    # clone some arbitrary repo
    Repo.clone_from('https://github.com/ros/rosdistro', path)
    shutil.rmtree(path)
    return i



if __name__ == "__main__":

    tasks = 20
    workers = 4

    with futures.ThreadPoolExecutor(max_workers=workers) as executor:

        # this works as expected... delaying work until a thread is available
        # fs = [executor.submit(wait_then_return, i) for i in range(0, tasks)]
        # this doesn't... all 20 come in quick succession
        fs = [executor.submit(clone_then_return, i) for i in range(0, tasks)]

        for future in futures.as_completed(fs):
            print('result: %s', future.result())

When I submit the wait_then_return function to the executor, I get the expected behaviour: the printing is done in a group of four at first, and then roughly along those lines until all the futures are complete. If I switch that for clone_then_return then it appears as if the executor ignores the max_workers argument and runs all twenty futures in parallel.

What could the cause of this be?


Solution

  • Actually the git call I was using had some authentication issues which were causing the future to complete quickly. All is still sane in the world of concurrency.