I have many git repos with DVC. DVC has file .dvc/config
that refers to remote (cloud storage). I want to migrate from one remote to another so one of the steps is to edit git history of .dvc/config
so it always refer to correct (new) remote.
As a rule, DVC releases (commits) are tagged. So I want to change .dvc/config
file in commits with tags but also keep tags where they were.
To sum up, I want to change ONLY contents of .dvc/config
and make other untouched (including git meta) as much as possible. How I can do this (would be better if scripted somehow - bash/python)?
I do not know how to edit git history in this situation - correct algo doesn't come to my mind. Tried rebase
and other git commands but didn't succeed - new commits are made with new hashes, tags are on old commits, argh..
By design (and for good reason), it's impossible to modify a Git commit. You can only create new commits.
Your approach with rebase
was actually correct here. Rebase walks through your commit history and allows you to arbitrarily modify files, run commands (see rebase --exec
), drop commits, modify commit metadata, and even insert new commits along the way.
The one thing that you cannot do, however, is make any changes and keep the Git hashes the same. Again, this is by design and for good reason.
Let's say you have a sequence of commits A ← B ← C ← D
, and you want to modify commit A
, leaving the rest unchanged. Unfortunately, you cannot modify A
, you can only make a brand new commit, which we will call A*
. So now you want to modify B
to point to A*
instead of A
as its parent, but you cannot modify B
, you can only make a brand new commit, which we will call B*
. This continues onward until you have rewritten your entire history to A* ← B* ← C* ← D*
. All of the old commits A
, B
, C
, and D
are still there, but unless you have another branch or tag pointing to D
or one of its descendants, they will now be "unreachable" and eventually they will all be garbage collected.
If you want to preserve tags after this operation, you need to recreate them with git tag -f
. See e.g. How can I move a tag on a git branch to a different commit?.
Consider that you might want to leave the old tags in place, creating new tags instead of replacing the old ones.
If you've pushed this repository to a remote, you will now need to either:
ours
strategy. See: https://stackoverflow.com/a/2862938/2954547Option 3 might be the lowest impact from the perspective of other users.
Finally, one option might be to avoid putting the remote storage details in .dvc/config
, instead using .dvc/config.local
. This way you don't have to rewrite Git history if you migrate to a new remote host. The tradeoff here is that the remote host is no longer stored in version control, which might cause other problems. This is very similar to the situation of a Git submodule that migrated to a new Git host, which is not hard to do in new commits, but poses the same challenges if you want to maintain a history of correctly-functioning code.