Search code examples
gitgit-submodules

Git submodule update gets stuck at old commit


Ok this one is driving me crazy. I tried committing/pushing/updating a parent repository with a file from its submodule open in another program (a XLS spreadsheet). The operation "succeeded" with only a Couldn't unlink old somefile.xls warning.

Now I'm trying to git submodule update and it keeps pointing to an old commit several steps back. Git log on the submodule main branch shows that HEAD is the latest commit both locally and remotely, but whenever I cd back and forth to the parent repository it ends detached on this old commit.

I tried manually updating the reference in .git/modules/mysubmodule/HEAD (which is pointing to this old commit) but apparently that's not how things work. How can I get out of this frustrating loop? I suppose making some insignificant changes to the submodule and making a new commit could fix it (I tried an empty commit without luck though), but I want to better understand what happened so I can avoid this situtation in the future.

Here's my submodule git log:

commit 713a39e531463eb9a9a608344ca39acbe520c7c4 (HEAD -> main, origin/main, origin/HEAD)

Here's what git submodule update outputs:

Submodule path 'data': checked out '7e4dc2354f5e60a8efb101a5d8a03466a911d86f'


Solution

  • Your mistake here lies in thinking that submodules should work. 😀

    OK, to be fair to submodules and Git, let's make that: should work automatically. Submodules can be made to work, but it's painful. (This is why some call them sob-modules.)

    The root of the problem is that a submodule is some other Git repository. Moreover, it's usually a clone of a third Git repository over which you may have little or no control. Each Git repository—each clone of a Git repository—is an island unto itself. ("No man is an island", but every Git repository is one.)

    For a Git repository to be a submodule, it must—by definition—be controllled by some other Git repository. Yet the two Git repositories involved in this insist that they shall never be controlled. So we have a problem.

    Git's solution to this problem goes like this:

    • In the superproject repository R, which would like to control submodule repository S, we place two things:

      1. There is a file called .gitmodules (in every commit, as files always are in Git, so that it's in the current commit no matter which commit you check out in R). This file lists what the superproject Git will need to run git clone to create S.
      2. In each commit in R that uses some commit from S, there is an entity that Git calls a gitlink. Git will copy this entity out of a commit into Git's index / staging-area.
    • Once the submodule S exists—whether you made it yourself, or let a Git command run in R create it—we'll have the Git commands that you run in R run git switch --detach hash in S.

    What this means is that R is in charge of which commit is to be used in S. Every commit you make in R lists the exact commit hash ID in S that will go with that commit in R.

    Running:

    git submodule update
    

    (with no other options) is a directive to the Git commands controlling R that they should:

    • read the hash ID for S from R's index / staging-area;
    • run git switch --detach hash in S using that hash ID.

    Until you change the hash ID there, git submodule update will keep checking out that particular commit.

    On the other hand, running:

    git submodule update --remote
    

    means something very different. Here, the Git operating in R enters S and runs:

    git fetch
    

    This causes the Git operating in S to reach out to the Git from which S was first cloned (S's origin) and see what new commits they have that S doesn't. Those new commits go into the S clone you have locally. They aren't being used yet, but now they exist. The git fetch operation also updates the various remote-tracking names such as origin/main and origin/xyzbranch within your clone S.

    Now that this is done, the Git running on behalf of R executes:

    git rev-parse origin/main
    

    or whatever other name you've chosen, to find out what commit S's origin's main identifies, by hash ID. That hash ID, whatever it is, is now used with the usual:

    git switch --detach <hash>
    

    so that S's current commit is now the commit found by their origin/main or whatever.

    That commit is checked out in S, but it's not listed in R anywhere. Running git submodule status or git status in R will show that S is out of sync with the hash ID that the index/staging-area for R says that S should have.

    To update the Git index in R, you must now run:

    git add path/to/submodule
    

    which records the hash ID that's actually checked out in S, in the index that the R Git is using for R. This is not yet committed: like anything in Git's index / staging-area, it's simply ready to go into the next commit you make. You can now update any other files in R as well if necessary, and git add those, and then run git commit to make a new commit with a new gitlink.

    The new R commit will now call for the commit in S that you obtained when you ran git submodule update --remote from R to update your S from S's origin. Note that none of these have anything to do with R itself, and you don't have to pick out an S commit by doing git submodule update --remote. Since S is a repository, you can enter the submodule:

    cd path/to/submodule
    

    and you're now operating in a Git for S instead of a Git for R. You can now do everything you'd do in any ordinary repository, because you're in any ordinary repository. It's just that this ordinary repository is acting as a submodule too. So once you get S onto a commit you like—even if you have to make this commit—you can pop back over to repository R and git add path/to/submodule to get the new hash ID recorded.

    Remember, though, that if and when you make a new commit in R and git push that commit to R's origin, someone else can grab the new commit from that (fourth) Git repository to their (fifth) Git repository that's a clone of R. There's no problem so far, but if they now check out your new commit, that commit you just made says that they should control their S clone by checking out the commit you made in your S clone. If you have not yet sent this commit to someplace that they can find it, they will now get an error if they run git submodule update in their R clone.

    (By this point we're up to six or eight or maybe even 42 clones depending on how many submodules you're using, and it's pretty confusing. The key is to remember that superprojects—Rs in the above notation—call for commits in their Submodules by raw hash ID, which means that anyone who clones the submodule needs to get a commit with that hash ID, which means that you typically need to git push in the submodules before you git push in the superprojects. Since all we ever do with any repository is add new commits—we never run git reset or git push --force or git rebase, right?—this always works. Well, until we start using reset, rebase, and forced pushing, or forget about the restrictions.)