Search code examples
gitgit-submodules

How can different branches of a primary repo use different branches of submodule?


Suppose that I have a primary repo y with some submodule, say at sub/x.

Suppose also that, for both the primary and the submodule repos, master is the active branch, and that the .gitmodules file of the primary repo specifies branch = master.

Now, suppose that, in addition to its master branch, the primary (y) repo has a branch yA, and likewise, in addition to its master branch, the submodule repo (x) has a branch xA.

I would like the yA branch of the y repo to "see"/use the xA branch of the x repo.

This would mean that switching between the master and yA branches in the primary repo would cause the corresponding switch between the master and xA branches in the submodule.

Does git have any support for this?


I tried the following:

  1. switched to the yA branch on the primary repo;
  2. switched to the xA branch on the submodule repo;
  3. replaced master with xA as the value for the branch parameter in the primary repo's .gitmodules file;
  4. in the primary repo, committed all the changes resulting from (2) and (3).

This did not work as I had hoped: if I switch to the master branch on the primary repo, this has no effect on the active branch setting of the submodule repo (and therefore, the master branch does not have a clean status).


Solution

  • Does git have any support for [what I want]?

    Sort of, I think. You will need to make sure that .git/modules (in the superproject) does not acquire a setting.

    TL;DR

    Use different settings in .gitmodules (in the superproject commits), and use git submodule update --remote as needed. I have not tested this, but see the long description.

    Long

    My overall general advice: the branch setting of a submodule is mostly useless and irrelevant. Just ignore it. We'll get to the mostly part in a bit, though, and you can see if you can use it.

    A submodule is defined as a Git repository in which some other Git will, on occasion, enter into that submodule and run some Git command. The other Git is called the superproject.

    The superproject Git's main operation is:

    (cd $path && git checkout $hash)
    

    Nowhere in this sequence does any branch name occur. That's why the branch setting is irrelevant.

    The $path and $hash parts come from the superproject Git's index, and they got there by being extracted from a commit in the superproject. That commit recorded the path of the submodule, and a raw hash ID. No branch name occurs here either.

    When you run git checkout or git switch in the superproject, to select some branch name and therefore some particular commit, the superproject Git extracts that commit to its (the superproject's) index and to your work-tree for that superproject. This puts the correct ($path, $hash) pair into the superproject's index.

    Unfortunately, it does not invoke the $(cd $path && git checkout $hash) part by default, to update the submodule. To make it do so, you have several options:

    • Run git submodule update. This command does exactly that (well, by default anyway: see details below).
    • Run git checkout --recurse-submodules (or the same flag for git switch). This command makes git checkout run the update, and propagates into the submodule Git, so that when that submodule runs git checkout (or git switch), if the submodule is a superproject for another submodule, that submodule will, in its superproject role, invoke the update. This will repeat (recursively) for all nested submodules. (I generally don't use this but I have not had to deal with recursive submodules much. It's quite powerful, because of the recursion.)
    • Set submodule.recurse to true. This enables the --recurse-submodules option on multiple commands, including checkout/switch, but also on git fetch and git pull. (I dislike this one: I think it's too powerful. However, you can set it, and then explicitly disable recursive push with the push.recurseSubmodules setting.)

    Details, or, when does the branch setting matter?

    The git submodule documentation has several long and fairly impenetrable paragraphs to describe the git submodule update sub-command. (I believe this indicates a flaw in the overall setup of submodules, but we must work with what we have, at least until we can come up with something better.) Let me quote from it here:

    update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--[no-]single-branch] [--] [<path>...​]
    Update the registered submodules to match what the superproject expects by cloning missing submodules, fetching missing commits in submodules and updating the working tree of the submodules. The "updating" can be done in several ways depending on command line options and the value of submodule.<name>.update configuration variable. ...

    As you can see, there are many options. To keep this answer from getting even longer, let's concentrate on just three of them: --checkout, --rebase, and --merge. There are two more that aren't options but that you can set with the submodule.name.update variable, which we'll ignore here. These options—--checkout, --rebase, and --merge—set which kind of action the update will use, which is the same as the option name without the leading double hyphen.

    The checkout mode is the default default. That is, if you have not set an explicit submodule.name.update setting, and you don't specify --rebase or --merge, you get checkout. So that's what everyone uses—mostly! So that's what the word mostly is doing in the overall general advice at the top of this answer.

    Now, on to the three modes. I'll quote from the documentation again, with some minor formatting changes and commentary afterward:

    • checkout
      the commit recorded in the superproject will be checked out in the submodule on a detached HEAD.

    • rebase
      the current branch of the submodule will be rebased onto the commit recorded in the superproject.

    • merge
      the commit recorded in the superproject will be merged into the current branch in the submodule.

    So, with the default mode, no branch name enters the picture anywhere. Only the rebase and merge modes actually make use of a branch name. So now we get to ask the question: which branch name?

    The documentation makes it clear: the current branch in the submodule. That's not the branch = setting of the submodule; it's the current branch in the submodule.

    But what branch is current, in the submodule? You can find out, if you like:

    (cd $path && git rev-parse --abbrev-ref HEAD)
    

    will tell you, for each path you pass in, what branch if any is current. It prints HEAD if the submodule is using detached-HEAD mode, as it will be if you've run git submodule update --checkout, or any git submodule update that uses checkout mode.

    If you were to predict the current branch, or whether the submodule is on a detached HEAD and therefore on any branch at all, what would you predict? Well, have you run git submodule update? You had to do a git submodule update --init initially, unless you did a recursive mode checkout, in which case Git did a git submodule update --init --checkout for you. So chances are that your submodule is in detached-HEAD mode, and therefore has no current branch.

    We're still a bit at sea, in other words. How do we get the submodule Git to be on a branch in the first place?

    There's one simple and obvious method: we can do our own (cd $path; git checkout $branch) where we provide the $path and $branch ourselves. That way, the submodule is on the branch we want, whatever commit that is. But since we're providing $branch, we don't need a setting. We just do:

    (cd path/to/submodule; git checkout feature/foo)
    

    directly. So that's not it either.

    If we scroll down to the OPTIONS section in the documentation, and then scroll further down to the --remote option, we finally find the one place where the setting is actually used:

    --remote
    This option is only valid for the update command. Instead of using the superproject’s recorded SHA-1 to update the submodule, use the status of the submodule’s remote-tracking branch. The remote used is branch’s remote (branch.<name>.remote), defaulting to origin. The remote branch used defaults to the remote HEAD, but the branch name may be overridden by setting the submodule.<name>.branch option in either .gitmodules or .git/config (with .git/config taking precedence).

    This works for any of the supported update procedures (--checkout, --rebase, etc.). The only change is the source of the target SHA-1. For example, submodule update --remote --merge will merge upstream submodule changes into the submodules, while submodule update --merge will merge superproject gitlink changes into the submodules.

    Alllllll-righty then!

    Seriously, this text is really hard to read—but what it says is that git submodule update --remote won't just use the raw SHA-1 hash ID from the superproject. Instead, it will use a raw SHA-1 hash ID it gets from somewhere else. Where, precisely, is the somewhere else?

    In order to ensure a current tracking branch state, update --remote fetches the submodule’s remote repository before calculating the SHA-1. If you don’t want to fetch, you should use submodule update --remote --no-fetch.

    So: when you use --remote with your git submodule update command, the superproject will:

    • step 1: (cd $path; git fetch), unless you add --no-fetch
    • step 2: (cd $path; git rev-parse $(complicated)) to get a hash ID.

    The $(complicated) part is complicated, but it grabs the branch name from the branch = setting, e.g., branch = master, from either .gitmodules or .git/config. It turns this into the remote-tracking name, such as origin/master, that step 1 will have just updated. See also VonC's answer to How can I specify a branch/tag when adding a Git submodule?.

    The special name . means use the branch name in the superproject—but:

    I would like the yA branch of the y repo to "see"/use the xA branch of the x repo.

    Unless the spellings match exactly, you can't get this with the . trick. And, if the submodule's branch name has been copied into the superproject's .git/config, it will stay set to whatever it is set to, but if not, the superproject Git will read the branch = setting from the .gitmodules file.

    If the .gitmodules file committed in the primary repository commit $SHA_YA as recorded in branch name yA says branch = xA, then, at the time you run git submodule update --remote (with or without --no-fetch), the superproject Git should do a git rev-parse on origin/xA, assuming submodule x has origin as its remote here. That will become the source of the raw hash ID that superproject y will pass to submodule x when superproject y runs (cd x; git checkout $hash).

    When you switch to some other commit—note that the branch name is not relevant here; what matters is the commit hash ID, and the .gitmodules file that is part of that commit—in the superproject, the .gitmodules file in the superproject can have some other branch = setting. Your git submodule update --remote command will find that setting, and have the submodule Git do a different git rev-parse to get the hash ID to pass to the submodule Git when the superproject tells the submodule what to check out.

    It is all very complicated, with a lot of moving parts. These parts must all line up at the right time. The superproject is ultimately really just using raw hash IDs. It's less head-ache-invoking to just use the right raw hash IDs. Once they're in a commit, they cannot be changed, and that's normally the right thing, so you just have to make sure they're correct before you commit.