Git - diff between branches when using cherrypicking

We are currently using Azure DevOps. We have a repository with multiple branches depending on version, i.e. branch version 1, version 2 and version 3.

When a bug is fixed in version 1 then we would like to have the same fix in version 2 and version 3. But version 1 is not a subset of version 2 so we cannot use merge command. Same between version 2 and version 3. Hence we use cherry-picking for the fix.

There is a risk that a fix is not implemented in all versions. I want to see commits done in version 1 but not in version 3. Excluding commits which have been cherry-picked to version 3. Is there a way to diff between branches which also handles cherry-picked commits?

Solution

Cherry-picking is, I and others would argue, the wrong tool for solving the problem at hand: see Raymond Chen's blog series of articles titled Stop cherry-picking, start merging. As Lasse V. Karelsen comments, cherry-picked commits are not tied together. Merge commits are tied together, so merging provides more indication that if a fix is not complete, additional merges may be required.

After you've read through the linked blog posts or my extremely limited summary below, and changed your general overall work-flow, you'll be creating a branch for fixing some specific bug and doing the fix work there. You'll then merge this branch into each release-candidate branch that has the bug. If the fix proves inadequate, you'll be adding more commits to the fix branch and will need to re-merge this into each release candidate. Depending on your bug-tracking system, you might want to have Git find branches into which the previous tip commit of the fix branch was merged. To do that, you can use git branch --merged (or, if you want to build your own tool, git for-each-ref, which is the plumbing version of some of the porcelain parts of git branch and git tag). So using the merging approach gives you better tools.

How to fix the bug by merging

Note that this is my summary of the blog posts; there is a lot more to learn there. I include this only to cover the basic rules of StackOverflow posts, which need to be self-contained (at least to the SO system itself) as blog links change over time and need maintenance.

What happens in software, over time, is that we write code that has bugs that don't show themselves immediately. When we use a good version control system, this allows us to trace the bugs to their point of origin. If we draw this as a simplified Git commit history graph, it winds up looking like this:

                     o--...--A   <-- release-1
                    /
                   /  o--...--B   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C   <-- release-3
                  \  \
                   \  o--...--D   <-- release-4
                    \
                     o--...--E   <-- release-5

Here, commit X is the one with the actual bug in it. As we can see from the diagram, this bug now infects five releases.

We can't fix the bug in the past, so what we often do—naively—is fix the bug in one of the releases:

                     o--...--A--F1   <-- release-1
                    /
                   /  o--...--B   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C   <-- release-3
                  \  \
                   \  o--...--D   <-- release-4
                    \
                     o--...--E   <-- release-5

where F1 is the commit that fixes the bug. But now we have to copy F1 to each release:

                     o--...--A--F1   <-- release-1
                    /
                   /  o--...--B--F2   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C--F3   <-- release-3
                  \  \
                   \  o--...--D--F4   <-- release-4
                    \
                     o--...--E--F5   <-- release-5

There's nothing fundamentally wrong with this if the fix is simple and obvious and won't need to be revisited. We'll end up with five commits, which is the minimum needed to actually fix the problem.

But what if the fix is subtle and complicated, or introduces performance issues that must be resolved later, or otherwise might need additional work? Later, as we come up with another commit or series of commits to fix things, we'll have to go back and revise each branch individually. What if we could go back in time to our old code and fix the problem where it happened, at commit X? Well, with a version control system, we can.¹

We check out historic commit X directly, then attach a new branch name to it—a bit hard to draw the way I've been drawing graphs. Here's an attempt:

                     o--...--A   <-- release-1
                    /
                   /  o--...--B   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C   <-- release-3
            .     \  \
             .     \  o--...--D   <-- release-4
              .     \
               .     o--...--E   <-- release-5
                .
                 . <-- fix-bug-where-it-cropped-up

Now, at this point, we make our fix commit(s):

                     o--...--A   <-- release-1
                    /
                   /  o--...--B   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C   <-- release-3
            .     \  \
             .     \  o--...--D   <-- release-4
              .     \
               .     o--...--E   <-- release-5
                .
                 .--F1--F2--F3--F4 <-- fix-bug-where-it-cropped-up

The tip of this branch—commit F4 at this point—can now be merged back into each existing release or development branch. Each merge generates a new merge commit, so we end up with more total commits than with the super-simple case for which cherry-picking is fine. But these commits actually record the merges. Here's the one that goes into release-5:

                     o--...--A   <-- release-1
                    /
                   /  o--...--B   <-- release-2
                  /  /
...--o--o--X--o--:--:--...--C   <-- release-3
            .     \  \
             .     \  o--...--D   <-- release-4
              .     \
               .     o--...--E-----M5   <-- release-5
                .                 /
                 .--F1--F2--F3--F4 <-- fix-bug-where-it-cropped-up

(I won't attempt to draw the other merges as the graph rapidly becomes very messy.) Should we discover, later, that the F1-F2-F3-F4 chain of commits is inadequate or incorrect, we can add more commits F5 etc to repair them, and then re-do each merge.

This isn't a panacea, by any means, but it's better than the cherry-pick method because it pins the fix to the bug and leaves traces in the Git commit graph. The Git commit graph is the history of the project, so these traces show that this was an important fix, brought back into multiple releases. Note that the graph itself is rather crude and mechanical, so it's important to include good commit messages. The ones that git merge makes by default are not great, but do have the advantage of being predictably formed, and hence mechanically-search-able.

¹There are other conditions that apply as well: having a VCS is necessary, but not sufficient.