Search code examples
gitcherry-pickmerge-strategy

Replace whole state of application with state of another commit


I'd like to do "the hardest version" of cherry-pick/merge/rebase/checkout, what means that state of app on my branch begins to look exactly like in the cherry-picked commit (but with keeping history of my branch). In fact I could duplicate my repo, delete everything in my branch and next copy whole content from duplicated version set to needed commit. But well, that's not handy and I believe there's some easier way.

I already tried git cherry-pick <hash> --strategy-option theirs, but that's not perfect, because it doesn't remove files not existing in cherry-picked commit, what results in big mess in my case.

So, how can I do this?

Edit: I clarified that I need also keep my history, what was not obvious first.


Solution

  • That's not a cherry-pick at all. Don't use git cherry-pick to make it: use git commit to make it. Here's a very simple recipe1 to make it:

    $ git checkout <branch>                # get on the target branch
    $ cd $(git rev-parse --show-toplevel)  # ensure you're at the top of the work-tree
    $ git rm -r .                          # remove all tracked files from index and work-tree
    $ git checkout <hash> -- .             # create every file anew from <hash>
    $ git commit                           # make a new commit with all new info
    

    If you want to copy the commit message and such from commit <hash>, consider adding -c <hash> to the git commit line.


    1This is not the simplest, but it should be understandable. The simpler ones use plumbing commands after the initial git checkout, e.g.:

    git read-tree -u <hash>
    git commit
    

    or:

    git merge --ff-only $(git commit-tree -p HEAD <hash>^{tree} < /tmp/commit-msg)
    

    (untested and for the second one you'll have to construct a commit message).

    Long

    Remember that Git stores commits, with each commit being a complete snapshot of all source files, plus some metadata. The metadata for each commit includes the name and email address of whoever makes the commit; a date-and-time-stamp for when the commit was made; a log message to say why the commit was made; and, crucially for Git, the hash ID of the parent of the commit.

    Whenever you have the hash ID of some commit, we say that you are pointing to the commit. If one commit has the hash ID of another commit, the commit with the hash ID points to the other commit.

    What this means is that these hash IDs, embedded within each commit, form a backwards-looking chain. If we use single letters to stand in for commits, or number them C1, C2, and so on in sequence, we get:

    A <-B <-C <-D ... <-G <-H
    

    or:

    C1 <-C2 <-C3 ... <-C7 <-C8
    

    The actual name of each commit is of course some big ugly hash ID, but using letters or numbers like this makes it much easier for us, as humans, to deal with them. In any case, the key is that if we somehow save the hash ID of the last commit in the chain, we end up with the ability to follow the rest of the chain backwards, one commit at a time.

    The place we have Git store these hash IDs is in branch names. So a branch name like master just stores the real hash ID of commit H, while H itself stores the hash ID of its parent G, which stores the hash ID of its parent F, and so on:

    ... <-F <-G <-H   <-- master
    

    These backwards-looking links, from H to G to F, plus the snapshots saved with each commit plus the metadata about who made the commit and why, are the history in your repository. To retain the history that ends in H, you simply need to make sure that the next commit, when you make it, has H as its parent:

    ...--F--G--H--I   <-- master
    

    By making the new commit, Git changes the name master to remember the hash ID of new commit I, whose parent is H, whose parent is (still) G, and so on.

    Your goal is to make commit I using the snapshot that's associated with some other commit, such as K below:

    ...--F--G--H   <-- master
       \
        J------K------L   <-- somebranch
    

    Git actually builds new commits out of whatever is in the index, rather than what's in the source tree. So we start with git checkout master to make commit H the current commit and master the current branch, which fills in the index and work-tree from the contents of commit H.

    Next, we want the index to match commit K—with no other files than those that are in K—so we start by removing every file from the index. For sanity (i.e., so that we can see what we're doing) we let Get do the same to the work-tree, which it does automatically. So we run git rm -r . after making sure that . refers to the entire index / work-tree pair, by making sure we're at the top of the work-tree and not in some sub-directory / sub-folder.

    Now only untracked files remain in our work-tree. We can remove these too if we like, using plain rm or git clean, though in most cases they're harmless. If you wish to remove them, feel free to do that. Then we need to fill in the index—the work-tree once again comes along for the ride—from commit K, so we run git checkout <hash-of-K> -- .. The -- . is important: it tells Git don't switch commits, just extract everything from the commit named here. Our index and work-tree now match commit K.

    (If commit K has all files that we have in H, we could skip the git rm step. We only need the git rm to remove files that are in H but are not in K.)

    Last, now that we have the index (and work-tree) matching commit K, we're safe to make a new commit that is like K but does not connect to K.

    If you want a merge, use git merge --no-commit

    The above sequence results in:

    ...--F--G--H--I   <-- master
       \
        J-------K-----L   <-- somebranch
    

    where the saved source snapshot in commit I exactly matches that in commit K. However, the history produced by reading master, finding that it points to I, and then reading commit I and on backwards to H and G and F and so on, never mentions commit K at all.

    You might instead want a history that looks like this:

    ...--F--G--H--I   <-- master
       \         /
        J-------K-----L   <-- somebranch
    

    Here, commit I reaches back to both commits H and K.

    Making this variant of commit I is a little trickier, because aside from using the git commit-tree plumbing command, the only way to make commit I is to use git merge.

    Here, the easy way is to run git merge -s ours --no-commit, as in:

    $ git merge -s ours --no-commit <hash>  # start the merge but don't finish it
    $ git rm -r .                           # discard everything
    $ git checkout <hash> -- .              # replace with their content
    $ git commit                            # and now finish the merge
    

    We use -s ours here to make things go faster and more smoothly. What we're building is really the result of git merge -s theirs, except for the fact that there is no git merge -s theirs. The -s ours means ignore their commit, just keep the contents from our commit H. Then we throw that out and replace it with the content from their commit K, and then we finish the merge to get a merge commit I that points to both H and K.

    As before, there are plumbing command tricks that make this even easier. They're just not obvious unless you understand the low level storage format that Git uses internally. The "remove everything, then check out a different commit's contents" method is really obvious, and is easy to remember.