Search code examples
gitgit-commitgit-tag

Change contents of file in git history keeping tags


I have many git repos with DVC. DVC has file .dvc/config that refers to remote (cloud storage). I want to migrate from one remote to another so one of the steps is to edit git history of .dvc/config so it always refer to correct (new) remote. As a rule, DVC releases (commits) are tagged. So I want to change .dvc/config file in commits with tags but also keep tags where they were. To sum up, I want to change ONLY contents of .dvc/config and make other untouched (including git meta) as much as possible. How I can do this (would be better if scripted somehow - bash/python)?

I do not know how to edit git history in this situation - correct algo doesn't come to my mind. Tried rebase and other git commands but didn't succeed - new commits are made with new hashes, tags are on old commits, argh..


Solution

  • By design (and for good reason), it's impossible to modify a Git commit. You can only create new commits.

    Your approach with rebase was actually correct here. Rebase walks through your commit history and allows you to arbitrarily modify files, run commands (see rebase --exec), drop commits, modify commit metadata, and even insert new commits along the way.

    The one thing that you cannot do, however, is make any changes and keep the Git hashes the same. Again, this is by design and for good reason.

    Let's say you have a sequence of commits A ← B ← C ← D, and you want to modify commit A, leaving the rest unchanged. Unfortunately, you cannot modify A, you can only make a brand new commit, which we will call A*. So now you want to modify B to point to A* instead of A as its parent, but you cannot modify B, you can only make a brand new commit, which we will call B*. This continues onward until you have rewritten your entire history to A* ← B* ← C* ← D*. All of the old commits A, B, C, and D are still there, but unless you have another branch or tag pointing to D or one of its descendants, they will now be "unreachable" and eventually they will all be garbage collected.

    If you want to preserve tags after this operation, you need to recreate them with git tag -f. See e.g. How can I move a tag on a git branch to a different commit?.

    Consider that you might want to leave the old tags in place, creating new tags instead of replacing the old ones.

    If you've pushed this repository to a remote, you will now need to either:

    1. Make a new branch for your new commits, and tell other users to start using that instead of the old branch.
    2. Rename the old branch, force push to your branch, and tell other users to deal with it however they see fit. See: Other consequences of `git push --force`?
    3. Create a merge commit between the old branch and your newly-rebased/edited branch using the ours strategy. See: https://stackoverflow.com/a/2862938/2954547

    Option 3 might be the lowest impact from the perspective of other users.

    Finally, one option might be to avoid putting the remote storage details in .dvc/config, instead using .dvc/config.local. This way you don't have to rewrite Git history if you migrate to a new remote host. The tradeoff here is that the remote host is no longer stored in version control, which might cause other problems. This is very similar to the situation of a Git submodule that migrated to a new Git host, which is not hard to do in new commits, but poses the same challenges if you want to maintain a history of correctly-functioning code.