Search code examples
gitgit-submodulesgit-subtreegit-repogit-slave

Best practices for multiple git repositories


I have around 20 different repositories. Many are independent and compile as libraries but some others have dependencies among them. Dependency resolution and branching is complicated.

Suppose that I have a super project that only aggregates all other repositories. It is used exclusively to run tests -- no real development goes here.

/superproject  [master, HEAD]
    /a         [master, HEAD]
    /b         [master, HEAD]
    /c         [master, HEAD]
    /...

Now, to develop specific features or fixes for each one (a), especially one of those that require specific versions of projects to compile or run (b v2.0 and c 3.0) I have to create a new branch:

/superproject  [branch-a, HEAD]  <-- branch for 'a' project
    /a         [master]  <-- new commits here
    /b         [v2.0]
    /c         [v3.0]

For b, it might be required something else, like a v0.9 and c v3.1:

/superproject  [branch-b, HEAD]  <-- branch for 'b' project
    /a         [v0.9]   <-- older version than 'a'
    /b         [master] <-- new commits go here
    /c         [v3.1]   <-- newer version than 'a'

This becomes even more complex and complicated when implementing common git workflows involving feature branches, hotfix branches, release branches, etc. I was advised to (and advised against) using git-submodules, git-subtree, google's git-repo, git-slave, etc.

How can I manage continuous integration for such a complex project?

EDIT

The real question is how to run tests without having to mock all other dependent projects? Especially when all projects might use different versions. Trigger Jenkins tests after commits in git submodules


Solution

  • For working with multiple branches in parallel, use paralleled clones if possible. cd is an awful lot easier than checkout and clean and check-for-stale-detritus and recreate-caches every time you want to switch.


    So far as recording your test environments goes, what you're describing is exactly what submodules do, in every detail. For something this simple, I'm going to recommend setting yourself up without using the submodule command at all, and telling it about your setup once you're comfortable and the top item on your submodule-issues list is keystroke count.

    Starting from the setup in your question, here's how you set yourself up to record clean builds in the subprojects:

    cd $superproject
    git init .
    git add a b c etc
    git commit -m "recording test state for $thistest"
    

    That's it. You've committed a list of commit id's, i.e. the id's of the currently-checked-out commits in each of those repos. The actual content is in those repos, not this one, but that's the entire difference between files and submodules so far as git's concerned. The .gitmodules file has random notes to help cloners, mainly a suggested repo that's supposed to contain the necessary commits, and random notes for command defaults, but what it's doing is easy and obvious.

    Want to check out the right commit at path foo?

    (commit=`git rev-parse :foo`; cd foo; git checkout $commit)
    

    The rev-parse fetches the content id for foo from the index, the cd and checkout do that.

    Here's how you find all your submodules and what should be checked out there to recreate the staged aka indexed environment:

    git ls-files -s | grep ^16
    

    Check what your current index lists for a submodule and what's actually checked out there:

    echo $(git rev-parse :$submodule; (cd $submodule; git rev-parse HEAD))
    

    and there you go. Check out the right commits in all your submodules?

    git ls-files -s | grep ^16 | while read mode commit stage path; do
            (cd "$path"; git checkout $commit)
    done
    

    Sometimes you're carrying local patches you want applied to every checkout:

    git ls-files -s | grep ^16 | while read mode commit stage path; do
            (cd $path; git rebase $commit)
    done
    

    and so forth. There's git submodule commands for these, but they're not doing anything you don't see above. Same for all the rest, you can translate everything they do into near-oneliners like the ones above.

    There's nothing mysterious about submodules.


    Continuous integration is generally done with any of a whole lot of tools, I'll leave that for someone else to address.