Search code examples
gitgit-indexgit-untracked

Git - Have untracked files in online repository


I'm using git (bitbucket) to source control my linux configuration files. All the files are in the directory ~/.cfg/. Then I additionally have some local configuration files in ~/.cfg/local/ which are supposed to be different from machine to machine.

I would like to keep a copy of the local files in my online repository as a kind of sample local config but would like to otherwise not track the files. I don't really care whether they get cloned with git clone though, either way is fine.

I tried following this answer but that removes the files from the online repository.

I also tried the solution outlined in this blog post, which worked better, but unfortunately has 2 drawbacks: 1) it has to be repeated on each machine and 2) it does not actually unfollow the files. So if I ever accidentally upload a local config from some machine (forgetting to run the command from the post), the next git pull on any other machine will override that machine's local configuration.


To summarize, I would like a solution that does the following:

  1. It keeps the initial upload of the entire ~/.cfg/ (including ~/.cfg/local/) in the online repository.
  2. It pushes the contents of ~/.cfg/ but not the contents of ~/.cfg/local/ whenever I do the standard git add -A; git commit -m "asdf"; git push
  3. It pulls the contents of ~/.cfg/ but not the contents of ~/.cfg/local/ when I git pull.

Solution

  • I would like a solution that does [things Git can't do]

    Sorry, but the answer is: No, Git can't do that. You can get close, but it's not fun: it requires work on the part of everyone who runs git clone, and from then on there are repeated encounters that can cause burns. That's why the standard method is the one recommended in this answer to Can I 'git commit' a file and ignore its content changes?

    It may help to understand why Git can't do that. Let's look more specifically at what "that" is:

    • keep the initial upload of the entire ~/.cfg/ (including ~/.cfg/local/) in the online repository.

    This, you can do. But the phrasing is odd, because Git does not store files. Git stores commits, which contain files. That might seem like mere semantics, but then again, it's "mere semantics" as to whether "hot" water is nice for a shower (40˚C / 104˚F: hot, but not scalding), or will give you second-degree burns (95˚C / 203˚F: near boiling, at standard pressure).

    So, you can have a commit that contains files including cfg/foo and cfg/local/bar. So far, no real problem—the main problem is that you cannot have a commit that contains an empty directory cfg/local/, as Git stores only the file themselves in each commit, not the containing directory: it assumes that anyone using the repository later will create directories automatically as needed, whenever there's a file to be stored, whose name forces that future / other Git to call os.mkdir or whatever it is that creates a directory to contain that file.

    • push the contents of ~/.cfg/ but not the contents of ~/.cfg/local/ whenever I do the standard git add -A; git commit -m "asdf"; git push

    Here's the first problem, where those "mere" semantics are at least a little scalding: Git doesn't push files. Git pushes commits.

    You have three commands here. The first one, git add -A, tells Git: Update the index copy of all files that are recorded in the index, by replacing it with a fresh version from my work-tree. The second one, git commit, tells Git: Make a new commit using the files that are stored in the index. The third, git push, tells Git: Send some commit(s) to some other Git, then ask that other Git to set one or more of its references, such as its refs/heads/master—its master branch—to some hash-ID.

    This brings in this new term, the index, and that's where the trouble starts.

    If your cfg/local/bar file is in your index, it will be in your commits. If it is not in your index, it will not be in your commits. That's as simple as it gets, but its implications are nasty:

    • You can remove the file from your index without touching the work-tree version (git rm --cached cfg/local/bar), but this is going to cause a future problem.

    • Or, you can set the --assume-unchanged or --skip-worktree bits on the copy of the file that's in your index. This is almost good enough, but not quite. (Incidentally, the two are more or less equivalent, but "skip worktree" is the one that's intended for this kind of use—except that its true intent is really for use in sparse checkout. I'll write "skip worktree" below but this really means either one.)

    Setting the bit requires that you run a command manually after git clone. The index is private to your copy of the repository, so everyone who runs git clone must run this git update-index command too, at least once, right after git clone. (Git will not let you automate this through Git itself, though of course you can write a script to do it and distribute the script.)

    As you've probably already seen, this only almost works.

    • pull the contents of ~/.cfg/ but not the contents of ~/.cfg/local/ when I git pull

    Once again, Git will burn you here. The problem is that git pull is not really a thing of its own: it means run git fetch, then run a second Git command and the second Git command is going to cause trouble.

    The second Git command is normally git merge, and we'll assume for now that it is. The other option, git rebase, is worse for you, as rebase is essentially repeated git cherry-pick with each cherry-pick operation itself being a merge, resulting in multiple merges.

    Merges, like commits, happen in or through the index. Git loads all the files from three commits into the index, pairing up files in two separate steps (base vs "ours", and base vs theirs), and then combining the pairings. So this merges each file that's in the index, or, if a file that was in the index in an earlier commit isn't in the index now, removes or renames files.

    This means that if a file cfg/local/bar exists in the merge base commit and in "their" commit—and it will need to be there, if you want an initial git clone to populate cfg/local with cfg/local/bar—then it needs to exist in the "ours" commit as well, otherwise Git will insist on removing it to keep our change. That, in turn, means that if they have changed their copy in their commit, Git will want to apply their change to your copy in your commit too.

    If you've used git update-index to fuss with the --skip-worktree flag, you've been re-committing the original version of cfg/local/bar all along. The flag just tells Git: Hey, don't look at my own version of this file, just assume that the copy in the index is still correct. This affects the git add -A step: instead of Update all files that are listed in the index, it actually does: Update all files that aren't specially marked. You can change cfg/local/bar all you like, and git add -A will skip over the update: it won't copy your work-tree cfg/local/bar back into the index, instead keeping the copy it put into the index back when you first had git clone run git checkout for you.

    So all of your commits have a cfg/local/bar, but the contents these commits store in that cfg/local/bar, in each commit, are the same contents you got when you ran git clone, even if you've changed the work-tree copy. Your skip-worktree bit told your Git to just leave the index copy of cfg/local/bar alone, which it has done.

    But now that it's merge time, and they have changed their cfg/local/bar for whatever reason—the reason doesn't matter, what matters is that they did change it—now your Git is faced with the job of combining your changes (none) with their changes (some). It does so by taking the only changes—theirs, of course—and now your Git will insist on copying the updated cfg/local/bar out into your work-tree. This will overwrite your cfg/local/bar, and that's the pain point: that's where this approach burns you.

    If they never (not ever, not once) change their cfg/local/bar, this approach—setting skip-worktree—will actually work. But that depends on the kindness of strangers, or at least, on the idea that the local config in cfg/local/bar in every commit ever be exactly the same ... in which case, what was the point of committing it at all?

    But if they ever do change it, you'll get burned (mild or otherwise) when you merge their change with your lack-of-change, because Git will want to overwrite your cfg/local/bar with their updated one.

    The alternative, in which you remove your cfg/local/bar from your index early on, is worse: now every commit you push doesn't have the file at all. Git views this as a command: When going from a commit that does have the file, to one that doesn't have the file, remove the file. So if you take this approach, you're the one who changed the file! You told everyone else: Remove this file!

    The only truly, 100% guaranteed, correct way to deal with this is: Never commit the file in the first place. If every commit in the repository doesn't have cfg/local/bar, that file will never be put into the index. If that name is listed in a .gitignore as well, no automatic "add all files" will add it to the index, so it won't be in future commits. That means it won't be in there when you start, nor when you finish. Git will never want to merge it, nor overwrite your copy of it. It will always be an untracked-and-ignored file, existing in your work-tree, but not in any of your commits.

    Of course, this means there's a little bit of initial pain: every time you run git clone <url> you must also do: cp -r .cfg/local-committed/ .cfg/local. But if you were going to use --skip-worktree, then every time you run git clone <url> you must follow that immediately with git update-index --skip-worktree .cfg/local/bar. So it's exactly the same amount of pain as the bad alternative, without any of its badness.

    Moreover, if you're in control of the software, you can set up the software so that, if .cfg/local/ does not exist when you first run the program, the program creates .cfg/local/ by copying from .cfg/local-committed/. Then that pain of "first setup" goes away too! That's why committing the default configuration into a separate file, that the user either manually or automatically copies to the local configuration file, which remains an untracked file forever, is the correct solution.