Search code examples
gitversion-control

How to prevent git from committing a specific set of local changes


I often face the situation where I have to keep local configuration and small code changes that I never actually want git to commit - e.g. a different API_BASE_URL or some mock data for a feature I'm currently working on. It's not limited to changes that .env files are able to cover as far as I can see.

I mostly manage these just fine manually but sometimes part of the changes slip into a pushed commit which is annoying and would be easily avoidable with better tooling.

Is there a way to tell git (or maybe Visual Studio Code/Spacemacs) this?

Hey buddy, I've got these local changes that I'm constantly fiddling with, don't bother staging or committing these ever.

Thanks!


Solution

  • The main mechanism that is built in to Git that can help here is to use smudge and clean filters. See the Q&A that ElpieKay linked in a comment, git smudge/clean filter between branches, or jump directly to VonC's answer here.

    Your smudge and clean filters can do anything you like: they're just general purpose text filters.

    • When Git copies a file from the index to your work-tree, it will run the text through your smudge filter. The smudged text is what will be visible in your work-tree.

    • When Git copies a file from your work-tree to the index, it will run the text through the clean filter. The cleaned text is what will appear in the index copy of the file.1 The index copy of the file is the one that will get committed.

    When you switch from this commit to some other commit, if the other commit's copy of that file differs from this commit's copy, Git will first copy the file from the other commit to the index (see footnote 1 again), then extract that file from the index to your work-tree, pushing the extracted text through your smudge filter.

    The .gitattributes file lists which files get run through which filters, and your main .git/config or ~/.gitconfig file describes the available filter programs.

    An alternative mechanism, which is probably not suitable for your particular case, is that after extracting files from some commit into both the index and your work-tree—i.e., after some initial git checkout—you can use git update-index to mark the index copy of the file specially. From this point on, Git operations that would compare the index copy and the work-tree copy mostly just assume that the index copy is correct and matches the work-tree copy.

    The flags that you can set here are called assume unchanged and skip worktree. To set one such flag, you would use git update-index --skip-worktree path. The two flags mostly do the same thing, but their internal purposes are different: assume-unchanged is meant for systems on which lstat calls are very slow, to skip them, and skip-worktree is meant for the sparse checkout feature. If you're actually using sparse checkout, Git will flip these bits on and off itself, so you'd want to use the other flag. Otherwise, the general recommendation is to use the skip-worktree flag (but in practice both actually work out the same).

    Note that once you set one of these flags, the fact that you have done this is largely invisible.2 You must remember that you did it. It will produce extremely puzzling behavior if you forget. Using these bits this way is pretty painful because in the case that Git does need to replace the index copy of a file—e.g., when switching to a commit whose copy of the file doesn't match the copy currently loaded into your index—Git does need to overwrite the index copy, which means Git wants to overwrite the work-tree copy as well. If the flag is set and the work-tree copy exists and doesn't match the index copy, Git will refuse to proceed. To get Git to proceed, you generally need to turn off the flag, deal with the file somehow, switch commits, deal with the file again, turn on the flag again, and resume working.


    1The "copy" of the file in the index is actually a copy in the repository, in the form of a blob object. The index holds a reference to the blob object. If and when you make a new commit, the new commit holds a reference to this same blob object (through a tree object). When you run git add, Git cleans the file, builds a temporary blob object from the result, finds its blob hash ID, and finds out if the object already exists. If the object already exists, git add is basically done: all we needed was its hash ID, and that goes into the index. If the object doesn't exist, git add puts the object into the objects database, and now we're done: the new object's hash ID goes into the index.

    This means that when you git add a file several times before committing, you may generate a bit of garbage: an unused blob. The garbage will eventually be cleaned away through Git's git gc. Git makes garbage objects all the time on its own, so there's no need to worry about this. The exception to this rule is if you accidentally add some huge file, e.g., a 500 GB file. That garbage object will persist for at least two weeks by default, using up a noticeable amount of disk space.

    2I wrote a script, git-flagged, to find (and optionally de-flag) my files. Occasionally I find myself in an annoying situation where I have to flag some files with skip-worktree for a while, and remembering where they were was painful, hence the script. Now I can run git flagged to see them more easily.