In my organization, we are using Google's repo
tool to maintain a codebase spread over ~200 git repositories. Since compilation and test runs a quite slow, I usually have more than out checkout of that source tree on my Linux machine (e.g. one that's currently compiling, and another one where I prepare the next commit).
These checked-out source trees consume about 7.5GB each, with 5.5GB being the git object store (ordinarily in the .git
folder of each repository, but repo
redirects this to a .repo
folder in the root of the source tree) and only 2GB for the actual working copy. So my question is: how can I (easily) make those different checkouts share their object stores so that each git object in the object store is stored only once on my hard disk?
I know that this is possible with multiple checkouts of an individual git repository, but am not sure how repo's redirection of the object store might affect these approaches. Simply replacing duplicate files by hardlinks will probably not work, since git is storing most objects in shared pack files, and those won't be identical between different checkouts even if the objects inside them are.
What I'm doing is this:
Run repo init
to initialize the new repo.
Inside .repo
and create two symlinks called project-objects
and projects
which point to the same-named directories in the existing .repo
.
Then go up one level and repo sync
.
repo
likes this so far; if I run into any problems, I will update this answer.
Almost certainly, we want to avoid running concurrent repo
operations in repo trees that share objects this way, because these operations could, in turn, issue concurrent operations in the same git
repos.
If this is the way to go, the obvious next step is to put the global object store outside of any .repo
directory in some special location, and point all of them there via symlinks.
It looks like the --mirror
and --reference
parameters of repo
should achieve something similar, but I can't find any documentation on them to explain what exactly they do and repo help init
is scant in details. It looks as if --mirror
is supposed to pull down a local mirror of a repo (not a client checkout but a special mirror object), and which is then referenced with --reference
parameters when checking out client repos.
The advantage of the symlinks is that I understand what they are doing without having to read undocumented Python source.