Search code examples
gitgit-mergegit-squash

Keep commits in feature branch after squash merge to master


Goal: After a new feature has been developed in feature-branch, I want to "merge" it to master with a single commit in master's commit history. However, I would still like to be able to access the original commit messages for each changed line, even after feature-branch is deleted.

Rationale: This is like the default behavior of merging a branch into trunk with Subversion. The advantage is that the history of trunk/master remains lean, i.e., it contains only a single, high-level commit message like Develop feature x. However, if I am not sure why a particular part of the code was changed to what it is now, in Subversion I can dig in deeper using svn blame --use-merge-history and see the original commit message.

Potential solution: As far as I understand, with git the single commit in master could be achieved using a git merge --squash strategy. However, it seems that this does not actually create a merge-commit but just a regular commit that does not retain the complete history of feature-branch. In fact, once I delete feature-branch afterwards, its commits will eventually be garbage-collected, since its commit objects are essentially unreachable now.

Thus my question finally is: How to to retain the commits in a feature branch after squash-merging it to master and deleting it, and without any extra requirements (like creating a tag for each deleted branch)?


Solution

  • Don't do that. Do a regular merge.

    When you want to view the feature as a single entity, use git log --first-parent. This directs your git log to avoid exploring the side branch.

    Let's look briefly at what the commit graph looks like. The commit graph is a drawing of each commit showing how it connects back to its parent commit(s). The difference here between a regular (non-merge) commit and a merge commit is that a regular commit connects back to just one previous commit, while a merge connects back to two (or more, but you won't make such merges, so there is no need to attempt to draw them here).

    Remember, each commit has a unique hash ID—a big ugly string of digits and letters that means that commit, that every Git in the world agrees is reserved for that commit—but these hash IDs are meaningless to humans, so we can either draw them as little round os, or use uppercase letters to stand in for them. Remember too that you can draw the graph any way you like: what matter are the commits, and their connecting arrows, which always point backwards (from later commits to earlier ones).

    A simple string of commits, then, might look like this:

    ... <-F <-G <-H ...
    

    Somehow, you've found existing commit H's actual hash ID. You use that to have your Git fish out the commit, including things like its author's name and log message, for viewing. Commit H itself includes the actual hash ID of earlier commit G. Your Git can therefore fish out commit G and show you the author name and log message. That commit contains the hash ID of earlier commit F. This process continues until Git reaches the very first commit, which doesn't point back to anything earlier because it can't, or until you get tired of git log output and just stop looking.😀

    How did you find hash ID H? Well, if there's a later commit, you—or your Git—got H from that later commit. But if H is the last commit in branch master, you got H's hash ID out of the name master. When you add a new commit to master, your Git records H's hash in the new commit I, and then writes the new commit's hash ID into the name master. So by definition, a branch name always contains the latest commit's hash ID. Git starts there and works backwards.

    Now let's look at a more complicated set of branches. We won't bother drawing the connecting arrows as arrows any more, since once they're in some commits, they are read-only, frozen for all time. (All parts of every commit are frozen forever like this.) The names move over time, though, so let's draw those arrows:

    ...--F--G--H   <-- master
                \
                 I--J   <-- feature
    

    The name master selects commit H; the name feature selects commit J. If for some reason we go back to master and add a few more commits we get:

    ...--F--G--H--K--L   <-- master
                \
                 I--J   <-- feature
    

    We can draw that like this if we prefer, and for the moment, I do:

                 K--L   <-- master
                /
    ...--F--G--H
                \
                 I--J   <-- feature
    

    If we now git checkout master; git merge feature we'll get a true merge commit:

                 K--L
                /    \
    ...--F--G--H      M   <-- master (HEAD)
                \    /
                 I--J   <-- feature
    

    The attached HEAD is a reminder that master is the branch we have checked out right now, for cases when it matters. This includes when we run git log without saying which commit git log should look at first. Git will use HEAD to find the current commit, which is now M. It also matters when we run git commit to make a new commit: the new commit's parent will be the current commit, and Git will update the current branch name—the one HEAD is attached to—to remember the hash ID of the new commit. That's why M's first parent is L and why master is now commit M. The special feature of a merge commit is that it has two parents. The first one is L, and the second one is J.

    If you run git log right now, Git will first start at commit M, showing you the merge's log message. Then it will look at both commits L and J and try to show you both at the same time. It literally can't, so it picks one to show first. Which one it picks depends on the sorting options you give to git log. The default is to show whichever one has the newest committer timestamp first.

    If you say --first-parent, though, git log won't look at commit J at all. It will look only at the first parent of M, which is L. It will show commit L, then move back one step to commit K and show that, then move back one step to commit H, and show that, and so on.

    (Note that we can now safely delete the name feature.)

    Fast-forward merges aren't merges

    The reason I inserted commits K-L was to make drawing the graph easier and more symmetric. More realistically, if you develop features on branches and then merge them to master, you'd just have:

    ...--F--G--H   <-- master (HEAD)
                \
                 I--J   <-- feature
    

    when you go to merge feature. Running git merge feature, your Git will notice that the merge base, which was commit H last time, is still commit H, but this time, commit H is also the last commit in master. This means Git can skip the actual work of merging.

    Git calls this kind of not-a-merge operation a fast-forward merge. To avoid it, you'll have to use git merge --no-ff once (or use GitHub's "merge" button, which always does a non-fast-forward, true merge).

    Forcing a real merge with --no-ff

    If we make a --no-ff merge, Git will do a true merge. It will diff commit H's snapshot against commit H's snapshot, and diff H against J, as a true merge has to; it will then combine these changes and make a merge commit (which I'll just call K this time). That gives us this graph:

    ...--F--G--H------K   <-- master (HEAD)
                \    /
                 I--J   <-- feature
    

    When we run git log here, Git will visit commit K and show it, then visit both H and J. By default, the sort order will make it print J next, then I, then H. So we'll see all the feature commits.

    But if we add --first-parent to our git log, Git will visit commit K. Then it will follow the first parent linkage back to commit H, and show that. Then it will move back to commit G, and show that, and so on.

    We can delete the name feature now, if we like, but we can also keep developing on feature if we like:

    ...--F--G--H------K   <-- master
                \    /
                 I--J   <-- feature (HEAD)
    

    The new placement of HEAD here implies we ran git checkout feature. Now new commits extend feature:

    ...--o--o--o------o   <-- master
                \    /
                 o--o--o--o--o   <-- feature (HEAD)
    

    If we now git checkout master and git merge feature, we'll get a true merge even without forcing one. (There's no harm in adding --no-ff to the merge command, though.) That will look like this:

    ...--o--o--o------o--------o   <-- master (HEAD)
                \    /        /
                 o--o--o--o--o   <-- feature
    

    Using git log --first-parent, Git will show the last commit on master, then the previous merge on master, and so on: we never see the work done on feature.

    It's all there, and easy to find if we want it: just run git log without --first-parent. When the feature is truly finished, and the last merge is in place, you can safely delete the name feature. Meanwhile, you can at any time create new features, starting from any commit anywhere your like in the graph, work on them, and eventually merge them. For instance, suppose you need to put in a quick fix on master:

    ...--o--o--o------o--------o--o   <-- master (HEAD)
                \    /        /
                 o--o--o--o--o   <-- feature
    

    and now make a secondary feature2:

    ...--o--o--o------o--------o--o   <-- master, feature2 (HEAD)
                \    /        /
                 o--o--o--o--o   <-- feature
    

    and start committing on feature2:

                                    o--o   <-- feature2 (HEAD)
                                   /
    ...--o--o--o------o--------o--o   <-- master
                \    /        /
                 o--o--o--o--o   <-- feature
    

    while continuing work on feature:

                                    o--o   <-- feature2
                                   /
    ...--o--o--o------o--------o--o   <-- master
                \    /        /
                 o--o--o--o--o--o--o--o   <-- feature (HEAD)
    

    When you're ready, you can merge feature2, which again requires --no-ff:

                                    X--Y   <-- feature2
                                   /    \
    ...--o--o--o------o--------o--W------Z   <-- master (HEAD)
                \    /        /
                 o--o--o--o--o--o--o--o   <-- feature
    

    (note that the first parent of Z is W, not Y; note that we're running out of letters, which is why Git doesn't use simple short numbers or letters for commit IDs!).

    Maybe feature2 is just about done now:

                                    o--o   <-- feature2
                                   /    \
    ...--o--o--o------o--------o--o------o----o   <-- master (HEAD)
                \    /        /              /
                 o--o--o--o--o--o--o--o--o--o
    

    The --first-parent flag continues to do the job of following only the master-to-previously-on-master linkage, right down the middle of the graph, with no peeking at the side views.