I have a repo in which I have two branches, master
and master-old
, which was created as an orphan branch.
Now I want to rebase the entirety of master
onto master-old
, but the tree of each commit should stay unchanged, i.e. the working copies of each commit on master
and master-old
should look exactly the same way before and after the rebase.
Current state
-------------
A - B - C - D <--- master
E - F - G - H <--- master-old
Desired state
-------------
E'- F'- G'- H'- A'- B'- C'- D' <--- master
I tried to accomplish this using git rebase --onto master-old --root
. The problem is, that in both, the initial commit to master
and the entire commit history of master-old
, a lot of the same files were created, so I get a huge amount of conflicts to resolve.
Is there a way to rewrite history in a way that keeps the tree of each commit intact?
Given that you want to retain the trees associated with the original A--B--C--D
series of commits, you don't really want to rebase after all. Rebasing implies turning commits into diffs (changesets) and then applying those changesets, one at a time, to some existing starting point—but all you want to do is to copy the tree that's attached to A
to your new commit A'
whose parent is H
, then copy the tree attached to B
to the new commit B'
whose parent is A'
, and so on.
This is where git filter-branch
works well. When you run:
git filter-branch <filter-list> <branch-name>
Git finds every commit reachable from the given <branch-name>
, and then copies each of these commits. The copy is done, logically speaking anyway, by extracting the entire commit as-is, running each of the filters in your <filter-list>
, and then making a new commit using the resulting tree and message. It runs through the copying process in the reverse of Git's normal order, i.e., "forwards through history", instead of backwards.
If the new commit (with its maybe-altered-maybe-not tree, maybe-altered-maybe-not parent, maybe-altered-maybe-not message, etc.) is 100% bit-for-bit identical to the original commit, the new commit's hash ID is unchanged. In that case, the default "new parent" for the next commit is the same as the original parent. Otherwise the default "new parent" for the next commit is the one we just made.
(In practice, because the commit graph can diverge and merge again and because you can skip commits or add new commits, what filter-branch really does is to make a mapping of old commit hash to new commit hash. Each time it makes a copy, it enters a pair: <old-hash, new-hash> into this mapping. For a simple linear chain, though, you can think of this as just remembering the most recent commit's new hash ID.)
Now, the issue you have here is that you want to change the parent hash ID of one specific commit, namely the root commit. There's a filter specifically for that, the --parent-filter
. There are two more ways to do this but let's describe the --parent-filter
first. This is from the git filter-branch
documentation:
--parent-filter <command>
This is the filter for rewriting the commit's parent list. It will receive the parent string on stdin and shall output the new parent string on stdout. The parent string is in the format described in git-commit-tree(1): empty for the initial commit, "-p parent" for a normal commit and "-p parent1 -p parent2 -p parent3 ..." for a merge commit.
Hence, you could test whether stdin is empty, and if so, output -p <hash-of-H>
. The result would be:
E--F--G--H--A'-B'-C'-D' <-- master
(not quite what you asked for, but maybe even better).
(To get the E-F-G-H
chain copied you'd have to pass master-old
as a positive reference as well, and since any bit-for-bit identical commit necessarily has the same hash ID as the original, you would have to make at least one change to commit E
, such as changing the committer tiemstamp by one second, for instance.)
The other two ways to do this are worth mentioning here. One is to use the --commit-filter
: this is the command that actually makes the new commit. You can do anything here, including omit some commits entirely; but the reason for all the other filters is to make things easier, so in this case there's no reason to use the commit filter at all.
git replace
Finally, there's the git replace
command. What git replace
does is to make new objects that stay in the repository, referenced by a special name in the refs/replace/
name-space. Whenever Git goes to look at some object by its hash ID, Git normally first checks to see if refs/replace/<hash-id>
exists. If so, Git looks instead at the object to which that reference points.
What this means is that you can construct a new Git object that is very much like commit A
, but slightly different. The slight difference is that the new commit object has one parent hash ID stored in it. The parent hash ID is that of commit H
. (Note that it has the same tree as A
.)
Now that you have this new object—let's call it A'
—you stick it into the repository and make refs/replace/<big-ugly-hash>
point to it:
A--B--C--D <-- master
E--F--G--H <-- master-old
\
A' <-- refs/replace/deadcabf001...
(based on A
's actual hash, which probably isn't really deadcabf001...
, so use the right ID here instead).
When git log
goes to view the history starting from commit D
, it will look at commit D
, then get D
's parent ID C
, look at commit C
, get B
's ID and move on to commit B
, get A
's ID and ... whoa, hey, there's a refs/replace/
for this one! Let's not look at A
after all! Let's look at A'
! It shows you A'
as B
's parent, then moves on to A'
's parent and shows you H
, and then G
, and so on.
When you use git replace
you do not have to copy any of the other commits. What you have is a commit history in which the new "better" commit supplants the old "not-so-good" one, but both actually coexist. Git uses the replacement under these conditions:
refs/replace/hash
in the references; andgit --no-replace-objects
.Requirement 3 lets you see the original (unreplaced) history, if you like. Item 2 means that on git clone
, you don't get replacements, by default. You must explicitly ask for them (which is not hard but does not have any nice easy front-end either).
Because of item 2 above, you might want to make a replacement, make sure it all works the way you like, and then run git filter-branch
. Since you are not running git --no-replace-objects filter-branch
, Git will see the replacement commit A'
instead of the original commit A
. It will therefore copy A'
instead of A
. You won't need a --parent-filter
. When it copies E
through H
, the new copies will be bit-for-bit identical to the originals, so those will stick around unchanged. The final result will be the same as if you had run git filter-branch
with the correct parent-filter.