Clone git repository and import object elsewhere

I have a problem where I want to use a git (gitpython) repo from two different files. However, I cannot come up with a smart way of cloning it only once, and then sharing the object between modules. The repos are pretty large so cloning them every time I need access to the object is not optimal.

I have tried creating a class and an instance of it in a file called utils.py. However, it does not seem like the repos are cloned when I import them even though it is supposed to happen in __init__ of the class in utils.py. Surely this must be some Python safeguard to prevent hanging on imports.

I have tried subclassing dict and using __getattr__ to clone when accessed if it hasn't already but it didn't work. It seemed like it just imported but skipped the cloning, like before.

This is what I need to define in utils.py so that I can import it elsewhere:

compiler_repo = git.Repo.clone_from(someurl, somepath)

Solution

The code in init is just python code that is run like any other. If you put an infinite loop or a blocking wait within it, then import can indeed hang.

You can minimize the startup time of your library by putting the expensive initialization code in a function that is called by the application. It won't change the amount of work to be done, but at least the timing of the work is under the control of the application and it will be easier to figure out what is going wrong (failed init code can be hard to debug).

There are a variety of ways to hide the initialization step. For example, you could define a class with a delayed loader:

class RepoLoader:
    def __init__(self, url, path):
        self.url = url
        self.path = path
        self._repo = None

    def fetch_repo(self):
        self._repo = git.Repo.clone_from(self.url, self.path)

    @property
    def repo(self):
        if self._repo is None:
            self.fetch_repo()
        return self._repo

compiler = RepoLoader(someurl, somepath)

Elsewhere in your package you can ask for the repo using:

from . import utils
compiler_repo = utils.compiler.repo

Now the user of the package can call utils.compiler.fetch_repo() as an initialization step if they want to control when it happens, or they can leave it up to the application. With a little more work you could put the fetch_repo in a separate thread so the rest of you application initialization can proceed, and only blocking when the repo is needed by the code.