Search code examples
gitgit-mergegit-submodulesatlassian-sourcetree

What does committing submodules do in Git?


In a project with a main repo called Main, we have 2 submodules Foo and Bar. Main has its own files & codes along with the submodules.

In the submodules, I can commit & push changes into their respective remotes with no issues. While using Sourcetree, when staging & committing in Main, for each submodule, I see these changes that can be staged & committed:

-Subproject commit e9e31f9db84eaea701985f3d19427c2c500932cf 
+Subproject commit 55663d49815519c0d23153161c9cbd961c680113

Sourcetree lets me skip staging & committing these when committing something else. Usually if I commit these, and try to merge it to another branch, I get conflicts.

What are these changes ? Should I be committing them instead of skipping ? How should I be managing them ?


Solution

  • This:

    -Subproject commit e9e31f9db84eaea701985f3d19427c2c500932cf 
    +Subproject commit 55663d49815519c0d23153161c9cbd961c680113
    

    is how—well, one way (out of multiple ways)—Git displays a git diff of two particular gitlinks.

    A gitlink is an entity that Git stores in its index, and therefore also in each commit you make that has one or more of these in the index. Such an entity provides the commit hash ID for a submodule.

    A submodule consists, in essence, of two parts:

    • First, there is one of these gitlink entities. That gitlink has a path within some commit in repository, and we call this repository—the one that contains the commit that contains the gitlink—a superproject. The path is, e.g., path/to/submodule or some-submodule or whatever: just like any normal Git file name path, complete with forward slashes. Associated with that path is a hash ID: e.g., e9e31f...2cf, or 55663d...113.

    • Second, in the same commit in the superproject, there should be a file named .gitmodules. This file contains the instructions that Git will need in order to run git clone to obtain a clone that—if all goes well—will contain the commit with the hash ID 55663d49815519c0d23153161c9cbd961c680113 for instance.

    When you make any commit in the superproject, that commit in the superproject contains the gitlink(s) for the submodule(s). Committing in the superproject creates a new commit: that new commit contains all the files that are in the superproject Git's index, including this gitlink or these gitlinks. And that's really all there is.

    For those gitlinks to work, the submodule—a separate Git repository—needs to have that commit in it. If you've run git add in the superproject, the superproject's gitlink now holds the hash ID that is the HEAD commit in the checked-out submodule commit, so that hash ID does in fact exist in your clone of the submodule at this point. All you need to do is make sure that that submodule commit is copied back to every other clone of that submodule that might need it.

    In other words, the usual work-flow goes like this:

    1. Check out the desired commit in the superproject.
    2. Run git submodule update (perhaps with --init and/or --recursive if desired). The submodule is now cloned and has a commit checked out as a detached HEAD, if all went well.
    3. Enter the submodule if necessary and:
      1. check out any specific commit, perhaps by branch name, if desired/appropriate/necessary;
      2. do any work desired/appropriate/necessary;
      3. commit this work so that there is a commit in the submodule;
      4. return to the superproject and run git add on the submodule path to record a new gitlink in the superproject's index.
    4. Do any other work required in the superproject.

    Unfortunately, the "outer level" steps 3 and 4, to be done in the superproject, sometimes necessitate multiple inner steps 1-2-3 in the submodule. As a result we often want to skip step 3.4 until step 4 is completed. This makes things pretty tricky.

    You'll see that kind of git diff output (with "subproject commit" numbers shown) in multiple different circumstances, so if you want clarification on any one particular circumstance, you must be very explicit about how you got to that point.

    Once all the commits exist (in both submodule and superproject) it is generally necessary to transmit those commits to other clones. You can achieve this with git push, sometimes, or with git fetch from the other clones, sometimes, or perhaps a combination of several pushes and/or fetches. Remember that there are at least two repositories on your own computer—the superproject and the submodule—and those two repositories correspond to two repositories somewhere else, so you'll probably need to run git push twice at a minimum.

    (Modern Git has git push --recursive to handle this semi-automatically, but there are various rough edges.)