Search code examples
gitgit-submodulesmonorepo

Git submodule track branch effectivly


I'd like to promote the idea of monorepo within my company.
I'd plan to use them this way:

I have one 'parent' repo holding one submodule for each components of our stack, thus maintaining a global versioning for the whole stack (we can simply checkout every components on a given branch)

This sounds perfect because we can still benefit of any CI services out of the box (has we still push on independent git repo, the submodules).

The only (terrible) weakness with this approach, is that if a do a

git submodule update --remote

Using the following config:

[submodule "commonLib"]
   path = commonLib
   url = [email protected]:org/commonLib.git
   branch = MY_BRANCH

Each submodule is effectively check-outed at the right commit.

But: They are all in detached Head

Why there no way to effectively use gitsumodule with branch. i.e: when updating, effectivly check-out the branch and not the commit pointed by this branch ? Is there for a technical reason or simply not yet implemented in git ?

Thanks


Solution

  • One part of the answer is that git submodules are designed to allow a consistent/coherent view of a set of multiple repositories. And the only way to achieve that is to have each submodule locked at a particular version, with the parent repo tracking all versions for all submodules, thus giving the overall project the appearance of a monorepo.

    When working in such project context it doesn't make a lot of sense to have a certain submodule specified just at a branch level because that may pick up a version which isn't consistent with the rest of the project.

    Another part of the answer is not specific to the git submodule, but to any git repo: when pulling a specific version the repo will be in detached head state. With not a lot of support for branch identification because in git branches don't have the same meaning and importance as in other version control systems, see this excellent answer for details: https://stackoverflow.com/a/3162929/4495081.

    I see 2 possible ways of reducing the risk of human error when picking the right branch at updates:

    • use consistent branch names across all your repositories and labels/tags (ideally produced by your CI/CD system) which have the branch name encoded in their identifier. A quite difficult sell at the beginning, with every component-owning team in desire of complete decision-making power, but it can get better, if/when the teams eventually understand that they actually need alignment to make a coherent product together.

    • provide project-level automation wrappers to operate on the repositories, which would extract the proper branch information from the parent repo (while also performing sanity checks and/or related operations to maintain the developer's workspace and the project consistency).