A local checked out git repository should be mirrored to a backup machine, i.e. the .git
dir and the working tree. On this backup machine filesystem snapshots will be taken to allow for easy and instant recovery from arbitrary git mishaps[1].
The obvious solution is to use rsync
and be done, but the regular git gc runs create new and different large .pack files which do not play nice with snapshotting[2]. This gc option can't easily be changed for the source repo(s). Also this would mean rsync
traversing everything in the .git/objects
subfolder, slowing it down.
It would be more elegant to use git
directly (and just pushing all already-commited work to a bare repository would be easy), but that leaves the worktree. The serverside repo config receive.denyCurrentBranch = updateInstead
would not work because the worktree might not be clean.
Would something like git push
'ing, and then rsync
'ing the worktree plus everything in .git
minus the objects
subfolder work? Ideally even an in-progress rebase, merge or cherry-pick in would be replicated. I thought of server side hooks[3] on post-receive
, but these never see the client worktree state.
1: For things where even git reflog doesn't help, such as the machine dying or .git
getting corrupted, or just lazy users.
2: E.g. three ~10 line commits and a gc run resulted in ca. 500MB files being transferred.
3: Serverside hooks mean the repo can't be restored via a plain scp -r
, but that is acceptable.
UPDATE:
Seems impossible, as e.g. jwz already found out in 2015[j], workarounds:
[..], there have been 3½ suggestions here:
Turn off pack files and gc entirely, which will cause small files to accumulate for every future change, and will eventually make things get slow. gc.auto 0, gc.autopacklimit 0.
Set the maximum pack size to a smaller number, so that no pack file gets too large, and subsequent layers of diffs get bundled into smaller pack files. pack.packSizeLimit.
Dissenting opinion on #2: That doesn't do what you think it does, it just slices a single large pack file into N different files with the same bits in them, so you haven't saved anything.
If you already have one gigantic pack file, create a .keep file next to it. New pack files will appear but they will be diffs against that saved one, and thus smaller.
If you want to sync the entire working tree state, you'll need to use some system outside of Git. Git intentionally does not sync the working tree state to other systems and can't be made to do so.
However, having said that, I urge you to reconsider whether you want to sync parts of the working tree such as the index. The index is not made to be transferred between machines because it contains information such as inode numbers and file timestamps. In addition, the security model of a Git repository assumes that the working tree is trusted, and the only safe operations that can be made on an untrusted repository are cloning and fetching.
However, if you really want to do so, you can do the push-and-rsync approach. I personally would take the much simpler approach of just using rsync
and eating the minor performance penalty on repack, since it isn't likely to be that common. By default, git gc
just creates a new pack with the new objects and doesn't repack all of the existing packs unless there are more than gc.autoPackLimit
(default 50) packs, so 98% of the time, you'll just rsync a single new pack and delete the old loose objects, plus whatever's in the working tree.