Search code examples
gitgit-submodulesgit-revert

Git didn't remove submodule after reverting main repository before it was added


Here's how this happened:

  1. I checked-out a branch which has a new submodule added to it.
  2. Then I went back to a commit on a different branch when the submodule wans't added.
  3. Now I can't compile because the checked-out submodule is still on disk; the files exist and if I go to root directory on terminal it's still there.

So how do I fix this? Why didn't git delete the folder when I went back to a commit where it never existed.

I've tried:

  • git submodule update on main repo.
  • git reset --hard in the repo of submodule.
  • git reset --hard in the main repo.

Nothing can delete this. I don't want to delete it manually because that is not the git way.


Solution

  • So how do I fix this?

    Manually remove all the submodule files from your work-tree: rm -rf submodule for instance.

    Why didn't git delete the folder when I went back to a commit where it never existed.

    Submodules are messy.

    Remember, a submodule, in Git, is really just another Git repository. This other Git repository is totally and completely independent of your main (superproject) repository. Or rather, it would be—you'd just have two separate clones, side by side, with no interaction at all between them—except for the fact that you've stuck the submodule clone into the work-tree for the superproject. And, in the process:

    • In modern Git, the repository itself gets "absorbed" into the superproject's repository directory (so that there's no .git sub-folder but instead a .git file in the work-tree copy of the checked-out commit from the submodule).

    • The superproject's Git records stuff about the submodule in the superproject's configuration (not just the .gitmodule file) and in some commits. In particular, in commits where the submodule is supposed to be checked out at some particular hash ID, the hash ID itself is stored as a gitlink entry in each commit.

    • The superproject's Git keeps cd-ing into the submodule and doing git checkout <hash-id> which detaches HEAD in the submodule. (Exactly when it does this is another issue; it depends on whether you have submodule recursion enabled.)

    In general, the fact that the submodule is a (mostly) independent repository inhibits the superproject Git from completely deleting it. In this particular case, it probably should just go ahead and delete it as part of "update the submodule to the correct commit", because the "correct" commit in the submodule in question is "no commit at all - remove the submodule from the work-tree". In the old days, before Git absorbed the .git directory into the superproject, this would literally destroy the submodule repository, which would be bad.1 In modern Git, it won't (once the submodule .git directory is absorbed at least), so it can be made to work.


    1In many situations, it would be harmless but for the need to re-clone. But suppose you made a superproject commit by (1) making a submodule commit on a new branch you made in the submodule; (2) exiting the submodule back to the superproject and updating the index gitlink entry to record the new hash ID in the submodule; (3) making the new commit. You now have a submodule commit that is not pushed anywhere, but can and should be pushed before the superproject commit is pushed.

    Now that you have this all set up, you decide to take a look at a historic commit. You git checkout <hash> in the superproject, and that historic commit doesn't have the submodule checked out. You have submodule.recurse set to true or have used git checkout --recurse-submodules so that the superproject brings the submodule to the correct (lack of) commit, and removes the submodule.

    Since the .git directory literally lives in the submodule—inside the submodule's work-tree—this completely, irrevocably destroys the commit you made in the submodule, on the branch you made there, that you have not yet pushed. So this must not be done.

    Now that the .git directory itself lives elsewhere, removing the submodule bodily from the superproject's work-tree does not harm the submodule repository. So now, this kind of removal would be safe. Restoring the superproject to the latest commit can simply restore the submodule's work-tree, checking out the commit that is GC-protected by the submodule's repository's branch (still not yet pushed).