Search code examples
linuxgitrepo

Shared object store for multiple checkouts with Google's 'repo' tool?


In my organization, we are using Google's repo tool to maintain a codebase spread over ~200 git repositories. Since compilation and test runs a quite slow, I usually have more than out checkout of that source tree on my Linux machine (e.g. one that's currently compiling, and another one where I prepare the next commit).

These checked-out source trees consume about 7.5GB each, with 5.5GB being the git object store (ordinarily in the .git folder of each repository, but repo redirects this to a .repo folder in the root of the source tree) and only 2GB for the actual working copy. So my question is: how can I (easily) make those different checkouts share their object stores so that each git object in the object store is stored only once on my hard disk?

I know that this is possible with multiple checkouts of an individual git repository, but am not sure how repo's redirection of the object store might affect these approaches. Simply replacing duplicate files by hardlinks will probably not work, since git is storing most objects in shared pack files, and those won't be identical between different checkouts even if the objects inside them are.


Solution

  • What I'm doing is this:

    1. Run repo init to initialize the new repo.

    2. Inside .repo and create two symlinks called project-objects and projects which point to the same-named directories in the existing .repo.

    3. Then go up one level and repo sync.

    repo likes this so far; if I run into any problems, I will update this answer.

    Almost certainly, we want to avoid running concurrent repo operations in repo trees that share objects this way, because these operations could, in turn, issue concurrent operations in the same git repos.

    If this is the way to go, the obvious next step is to put the global object store outside of any .repo directory in some special location, and point all of them there via symlinks.

    It looks like the --mirror and --reference parameters of repo should achieve something similar, but I can't find any documentation on them to explain what exactly they do and repo help init is scant in details. It looks as if --mirror is supposed to pull down a local mirror of a repo (not a client checkout but a special mirror object), and which is then referenced with --reference parameters when checking out client repos.

    The advantage of the symlinks is that I understand what they are doing without having to read undocumented Python source.