Managing the unification of several versions of a repo into a generic base branch - best practice workflow?

I've been given the task of 'taking ownership' of one of our plugins, which over the course of a couple of years has diverged into different versions for different clients, such that we have about 5 different 'prod' versions, some with the same features added, some with features missing.

Basically what I need to do is create one generic base branch merging all the various changes in and making sure nothing is missed. Then we want to start using the generic branch for all clients.

What would be the best way to approach this?

I could start with a 'latest' branch (ie. one that's had the most changes made to it) and make a new branch off of that and then (e.g. base-branch) try rebasing in the other 'prod' branches.
Or I could start back at the master branch (which is some way behind now) and try rebasing everything into that.
Or is there a better way to do this?

I've not done anything like this with a big project and lots of different versions, so I'm not sure the best practices to ensure, firstly that I don't miss any commits and end up with missing features, and secondly just the general workflow that is going to make this as painless as possible.

Thanks.

Solution

git is a little bit like a time machine.

You are in a bind now because you've got five unique versions of the software to reconcile into a unified product. With hindsight this problem wouldn't even exist if the branches had been merged and refactored into a common product as the features were added. You've probably got something like this as your tree;

           C--D
          /
R--*--*--*--*--*--E
    \  \
     A  B

Even if you don't have a tree at the moment as long as you know the history you can infer one and use the same process by copying versions over your working copy instead of merging.

What would be amazing is if we could go back in time to when we had our first fork (A, B) and merge them into a base-branch. You can do that with the git checkout command;

$ git checkout -b base-branch R

Gives;

R--*--*
    \  \
     A  B

This feels more manageable; git merge A is a fast-forward then git merge B will require some refactoring but we'll have;

     *--B   
    /    \
R--*---a--b <- (base-branch)
    \ /
     A

This is exactly what you would have wanted to happen. The first feature-set A be merged into base-branch followed by feature-set B will all the necessary refactoring required.

Imagine you're back in time again but in this timeline we've got base-branch and we've just released C, D and E. We want those feature-sets in the base-branch. You can continue with git merge C, refactor, git merge D, refactor as it may not be a fast-forward anymore. Finally git merge E, refactor and you're done.

         *--*--E
        /       \
       *--C--D   \
      /    \  \   \
     *--B   \  \   \
    /    \   \  \   \
R--*---a--b---c--d---e <- (base-branch)
    \ /
     A

What you end up with is a history of your project that looks exactly like what you'd want. Each feature-set developed in chronological order as though it'd been done correctly all along.

This won't be easy and you'll be doing more refactoring than gitting. Put yourself in the mindset of the authors at the time and merge as they would have with a little bit of your hindsight sprinkled in.