Search code examples
gitgit-cherry-pick

What is git cherry-pick equivalent in plumbing commands? How do I implement it in GitPython?


My problem is to do cherry-pick in GitPython. I couldn't find this command and decided that I have to do that in some other way.

Also, I'm just interested in how it works internally.

I understand cherry-picking commit A as trying to apply diff between A^ and A to HEAD. But I suspect it can be expressed in terms of merges somehow. That's why I ask for plumbing commands.

I tried to find something like git-cherry-pick.sh in Git repo on GitHub but couldn't find anything but tests and documentation.


Solution

  • Cherry-picking is a fundamental building block that has no non-fundamental equivalent. That is, there's no lower level operation that's "pure plumbing". The reason is that cherry-pick does a merge with a (potentially, at least) somewhat screwy merge base.

    That said, with proper inputs, git apply implements cherry-picking when run as git apply -3 (but git apply is not a plumbing command either). The way this works is using the Index: lines in each git diff. The Index: lines provide the otherwise-missing merge base information. There is still one thing that is different here though, having to do with rename detection.

    If there are no renames, the two are equivalent. This is because a merge operation has one key difference from a simple patch: a merge has a merge base, from which we can derive two patches.

    Consider the following sequence:

    • Alice and Bob start with a common Git repository, with file readme.txt in some commit.

    • Alice changes line 10 so that instead of saying "bees are purple", it says "bees are green". She also changes line 9 so that the file says "Everything below is bizarre." (And then, of course, Alice commits the new files.)

    • Bob changes line 10 so that instead of saying "bees are purple", it says "bees are green", and also adds a new line 20 so that it adds a claim that "submarines climb trees."

    Now, if Alice gets Bob's change as a patch (without an Index: line, as just a contextual diff, e.g., from diff -U) and feeds that into her Git, Alice's Git won't know what to do with Bob's change to line 10. It will have no problem with the line-20 addition, but the context for the "bees are green" change doesn't match: it doesn't have the "bizarre" bit.

    If, on the other hand, Alice get's Bob's change as a "cherry-pick-able patch" (either by running an actual git cherry-pick or by getting a diff with an Index: line and using git apply -3 or equivalent), Alice's Git now has more information. Alice's Git can now see not just that Bob changed readme.txt, but which version of that file he had when he started. Specifically, the Index: line has the blob hash of the "before" version of readme.txt, and since Alice and Bob started with the same version of the file, in the same commit. (It also has Bob's "after" version, which Alice doesn't have, but now the entire "after" version can be constructed if necessary—but it's unnecessary.)

    Now Alice's Git can run its own diff: it can diff the base version against Alice's current version, to see what Alice did. Then it could diff the base version against Bob's version, to get the patch-with-base that it already has (but why bother? that's the patch it already has!). Now it can (try to) combine the two patches: it sees that Bob's change to line 10 is redundant—it's contained within Alice's own changes—and concentrates only on line 20. Now Alice's Git can apply the patch.

    That's what a merge base is (and does) for a file. The rename case comes in when—and for Git, only when—Git can diff an entire tree, i.e., it needs the commit-as-a-whole (or at least the tree object attached to the commit). Here git apply will run out of its depth since it works on one file at a time. (The git am code might be able to deal with it if the incoming patch has "rename" instructions within it, but I don't think that's in Git, though I admit to not having looked lately.)