Search code examples
gitproject

Taking multiple repos and merging into a single repo (monorepo)


I would like to know as a PM what the potential risks are with merging multiple repo's into a monorepo?

I've tried asking the lead engineers as to what could go wrong, but they are so vested in getting this transition done for 12 teams using the individual repos they tell me there are no risks.

NA

In response to this question I am expecting a list of reasonable risks we should accept or mitigate:

Example:

Risk 1: We need to revert back to the old repos but can't as the old repos are now behind.

Risk 2: The size of the single repo is taking far longer to download and everything needs to be cloned as opposed to individual parts.

I know the above is rubbish that why I am asking for suggestions...

thanks


Solution

  • In general, monorepos tend to be a bad idea. Some Git operations perform linearly on the number of commits or other objects, which means that putting a large number of files and a large number of commits into one repository can cause your repository to slow down significantly. Even if you aren't hitting scale problems now, you can in the future, and at that point it will be significantly more difficult to extract your code back out into multiple repositories.

    There are workarounds that can cause monorepos to perform acceptably, such as Microsoft's VFS for Git. However, it's much better not to need that in the first place, since it requires quite a bit of effort to get things working.

    Any CI jobs you have will take longer to run, since they'll take longer to clone. You'll also likely be running CI jobs for the entire monorepo instead of individual components every time any item changes.

    You'll also find that you can end up with a lot more disk usage on developer systems. Developers who may have only needed to check out a few repos now need much more disk space, which may require larger, more expensive machines or VMs.

    Finally, your Git repository will be much larger. If you're hosting on the cloud, that can cause problems for you. For example, Bitbucket limits all repositories to 2 GB. Other providers may ask you to shrink your repository if the size starts causing performance problems for them. And even if you're hosting locally, large repositories take much more time to pack and repack, necessitating more CPU and memory to handle the same number of users.

    Instead of using a monorepo, you can either use submodules for multiple repositories, or you can simply keep a hash of the current version in a file in your repository and have a build step check it out and build it if it's changed. These solutions work well for large organizations, and they likely will for you as well.