gitworkflowgit-submodules

What are the advantages and disadvantages for the use of git submodules?


Problem

We're faced at work with conflicting view points regarding the use of submodules for reusable code in our embedded projects.

  • Some of the developers argue that they complexify the workflow and that existing projects like the Linux kernel doesn't use them at all since it is a single, flat repository, and we should take their example. They advance that this "master project" should be the base of all our independent products, instead of having all products include only the required submodules.
  • Others advance that versioning everything into a single repository reduces modularity (complexify re-usability by other projects) and makes it hard to maintain.

I don't know what to think, it is not the first time we have that discussion coming up and it feels like we're running in circles (history repeats itself).

I need to better understand what is the intended use of git submodules, their advantages & their drawbacks to make an informed decision on what we should do.

Our current use of submodules

Right now, we are versioning "cores" (think an audio processing library, a wireless library, etc.) in their own repository. Those cores are reused in many independent projects, running on may independent hardware platforms. Those hardware platforms BSPs are also versioned independently each in their own repository. The same goes for generic drivers for various hardware components. All these components are reused across independent projects, and some of those components, mainly the "cores", have a very fast & risky development schedule; breaking changes are standard for those.

In short, we use submodules for:

  • libraries or "reusable modules"
  • BSP
  • drivers
  • third-party libraries such as vendor HAL

Solution

  • Git submodules represent one way to combine the content of several repositories in a well-defined manner. It's a technical solution that you can use once you decided to maintain your code in different repositories (or branches).

    So, the fundamental question you're asking is: do you want to use a single "monorepo" or a "polyrepo" setup?

    These two concepts are heavily debated, each having their own pros and cons, and my suggestion is that you carefully see what fits your case best.

    Two key points are that

    • A monorepo makes it very easy to maintain changes on code that would otherwise live in separate places (e.g., your application using new feature of a driver). On the downside, monorepos can be technically more demanding (repos are bigger and it can be tricky to establish an efficient CI pipeline).

    • A polyrepo setup (e.g., using submodules) is more complex to use (as your developers stated). Keeping things in different places means you need a good solution for putting together "matching" code from different sources. However, it's technically less demanding and it's straightforward to setup CI for each component.

    A "good solution for putting together matching code" is a hard thing to do, since you need to explicitly control dependencies (e.g., application requires libX rev 1.1 and libY rev 2.2, but libY rev 2.2 has only been tested with lib X rev 0.9). With a monorepo, you'd avoid this issue upfront.