Search code examples
gitgit-submodules

revert a change to submodule in a commit


I accidentally changed the state of 2 submodules while doing a commit to files in the main repository. It might have happened while rebasing, but I'm not sure I don't understand submodules.

Is there a way to edit a commit and undo the changes to submodules without undoing the good changes to the files? I'm familiar with doing this with regular projects, using git rebase -i and git commit --amend, but I have no clue how to do it for submodules.

This is the messed up commit: https://github.com/PiRK/ElectrumABC/commit/f1bf0893c1becc01b8191c4a8c37eafd26c2a29d I need contrib/tor to point again to the 7ce4ae344 commot in it's remote repository, and contrib/ssl to point back to fd78df59b0 in https://github.com/openssl/openssl/

I'm happy with adding a commit that just fixes the two submodules, or changing history by editing my faulty commit (that would be best, as it avoids having an inconsistent state for a few commits).


Solution

  • To add a new commit that fixes both submodules:

    1. Check out the superproject, at the tip of the branch that ends with the "bad" commit. Run git submodule update --init if necessary to clone the two submodules.

    2. Enter each submodule and check out the correct commit, by the two raw hash IDs you listed:

      (cd contrib/tor && git checkout 7ce4ae344)
      (cd contrib/ssl && git checkout fd78df59b0)
      

      (You can use git switch --detach rather than git checkout, if you prefer; there's no difference in effect.)

    3. git status will now show that the two submodules in the "changes not staged for commit" section. Add them to Git's index:

      git add contrib/tor contrib/ssl
      

      Now git status will show them in the "changes staged for commit" section.

    4. Make the new commit.

    It is possible, using git rebase and the like, to replace some of the old commit(s) with new-and-improved ones; using git commit --amend instead of git commit in step 4 will replace just the one tip commit; but in general, replacing old bad commits with new-and-improved ones creates headaches for other people who already picked up the bad commits, and it adds very little value. If you're quite sure nobody else picked up the bad commits, feel free to do the extra work to make it look like no errors crept in, but nobody will really care. 😀

    What's going on

    Every commit in a Git repository represents the exact state of all files in that repository. That is, each commit holds a full snapshot of every file. Submodule "files" are simply commit hash IDs stored in some path name. This is a directive to the superproject Git: When this commit is checked out, so that these submodule paths and hash IDs are in Git's index, git submodule update shall enter each of the named submodules and run git checkout on the specified commit.

    So, each commit that has some submodule, really just has a path (contrib/tor for instance) and a raw hash ID. That hash ID had better be the hash ID of some commit that exists in a clone made by cloning whatever path is stored in .gitmodules or .git/config, for that submodule, if the submodule isn't already cloned; or, it had better be the hash ID of some commit that exists in that clone, or will exist in that clone if the superproject Git enters the submodule Git repository and runs git fetch.

    That's really all that submodules are: a directive. Enter some other Git repository and run git checkout. To update the directive, you just enter that repository yourself, run the given git checkout, return back to the superproject, and run git add. This updates Git's index, which is the source for the next commit you will make. So the next commit will now have this other hash ID in it.

    Note that submodules do not make use of branch names. They're always done by raw hash ID. You can set up branch names for them, but these are (at least currently) only used for special git submodule update options (specifically git submodule update --remote). Because most Git command use the raw hash ID, I recommend sticking with the raw hash ID method, at least until you're familiar with it and have a special case where git submodule update --remote becomes useful: remember that even --remote still works by hash ID, it just gets the hash ID from somewhere tricky.