I have a file myfile.txt
Line one
Line two
Line four
where each line has been added in a separate commit.
I edit the file to add a "missing" line, so the file is now
Line one
Line two
Line three
Line four
This bash script sets up the repository:
#!/bin/bash
mkdir -p ~/testrepo
cd ~/testrepo || exit
git init
echo 'Line one' >> myfile.txt
git add myfile.txt
git commit -m 'First commit'
echo 'Line two' >> myfile.txt
git commit -m 'Second Commit' myfile.txt
echo 'Line four' >> myfile.txt
git commit -m 'Third commit' myfile.txt
sed -i '/Line two/a Line three' myfile.txt
git commit --fixup=HEAD^ myfile.txt
The history looks like this
$ git --no-pager log --oneline
90e29ee (HEAD -> master) fixup! Second Commit
6a20f1a Third commit
ac1564b Second Commit
d8a038d First commit
I run an interactive rebase to combine the fixup commit into "Second Commit", but it reports a merge conflict:
$ git rebase -i --autosquash HEAD^^^
Auto-merging myfile.txt
CONFLICT (content): Merge conflict in myfile.txt
error: could not apply 90e29ee... fixup! Second Commit
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 90e29ee... fixup! Second Commit
$ git --no-pager diff
diff --cc myfile.txt
index b8b933b,43d9d5b..0000000
--- a/myfile.txt
+++ b/myfile.txt
@@@ -1,2 -1,4 +1,7 @@@
Line one
Line two
++<<<<<<< HEAD
++=======
+ Line three
+ Line four
++>>>>>>> 90e29ee... fixup! Second Commit
The desired history is would be
xxxxxxx (HEAD -> master) Third commit
xxxxxxx Second Commit
d8a038d First commit
where "Second Commit" looks like this:
diff --git a/myfile.txt b/myfile.txt
index e251870..802f69c 100644
--- a/myfile.txt
+++ b/myfile.txt
@@ -1 +1,3 @@
Line one
+Line two
+Line three
What you've run into here is basically an edge case in merges. You just have to fix these manually. You might wonder why I'm talking about merges, when you're not running git merge
. For that, see the long answer below.
What git rebase
does is to copy (some) commits. When using the interactive rebase, git rebase -i
, you can fiddle with the copying process. When using --autosquash
, Git itself fiddles with the copying process. This fiddling can lead to the problem you encountered. Even without any fiddling, you can still get conflicts. Let's explore this.
We need to start with a brief overview of commits. Each commit:
The parent commit hash ID(s) inside each commit form commits into backwards-looking chains. For instance, if we represent a simple linear chain of commits using single uppercase letters to stand in for hash IDs, we get a drawing like this:
... <-F <-G <-H
where H
stands in for the hash ID of the last commit in the chain. That commit contains both a snapshot and the hash ID of earlier commit G
. We say that H
points to G
. G
in turn points to F
, which points back still earlier.
Because commits hold snapshots, not changes, we need to have Git compare two snapshots in order to find changes. This is like playing a game of spot the difference. To do this, we can run git diff
and give it two raw commit hash IDs, or we can run git show
on a single commit, which compares the commit with its (single) parent. (The effect on merge commits, which are commits with two or more parents, is trickier.)
Because commits are found by their hash IDs, and the hash IDs are cryptographic checksums, we cannot change anything about any existing commit. If some commit is deficient in some way, the best we can do is extract it, fix it up, and put into Git a new commit: the different content will result in a new unique hash ID for the new commit. The existing commit will remain unchanged.
Because a commit contains the hash ID of its parent, if we "change" (i.e., copy) any commit, we're forced to "change" (copy) all subsequent commits as well. So any re-ordering of a commit, or any fixing of any broken-ness about any commit—including just fixing up its log message—has a ripple effect. That's not really a big deal: most commits are pretty cheap. The fact that Git re-uses (de-duplicates) files in snapshots, and even de-duplicates entire snapshots, means that changing part of a commit—such as its log message—without changing its snapshot hardly needs any disk space at all.1 So we need not worry about disk space, normally.
We do need to worry about other things when rebasing: in particular, we must worry about the copies of those commits that other Git repositories have. But if no other Git repository has those commits, that worry just falls aside as well. Overall, rebasing is really quite safe when we're just working with a private repository, or when we haven't sent our commits out to anyone else. Even if the whole process goes wrong, our original commits are still in Git. (It can, however, become a real chore to find the originals. When you have 47 people that look alike, and all of them claim to be Bruce, which Bruce is the original Bruce? So be sure to keep careful track, if you do this sort of thing.)
1Any commits that are completely abandoned in this process tend to linger for at least 30 days, but are then cleaned away automatically.
A branch name mainly just holds the hash ID of the last commit in some chain. That is, when we have:
...--G--H <-- branch1
what the name branch1
is doing for us is remembering hash ID H
. That way, we don't have to memorize it, or write it on a whiteboard, or whatever. If we now create a second branch name branch2
, that name also points to commit H
:
...--G--H <-- branch1, branch2
We attach the special name HEAD
to one (and only one) branch name, to denote which name, and thus which commit, we're using:
...--G--H <-- branch1 (HEAD), branch2
Now we make some new commits. The first new commit, which we'll call I
, will point back to the currently-last commit H
, and Git will write I
's hash ID into the name to which HEAD
is attached:
I <-- branch1 (HEAD)
/
...--G--H <-- branch2
If we make a second commit on branch1
, then git checkout branch2
or git switch branch2
to attach HEAD
to branch2
and make H
the current commit, we get:
I--J <-- branch1
/
...--G--H <-- branch2 (HEAD)
Making two more commits on the now-current-branch2
gives us:
I--J <-- branch1
/
...--G--H
\
K--L <-- branch2 (HEAD)
We can now use git merge
. If we first git checkout branch1
, J
will be the current commit and we'll git merge branch2
to combine work with commit L
. If we just git merge branch1
, L
will be the current commit and we'll combine work with commit J
. The merge effect is mostly symmetric here, but the final merge commit will extend whichever branch we are actually on, so let's git checkout branch1
first:
git checkout branch1 && git merge branch2
Git will now find the best shared commit—the best of the commits that are on both branches—to act as the merge base for this merge operation. In this case, the best shared commit is obvious: it's commit H
. Commit G
and all earlier commits are on both branches, but H
is "better" because it's closer to the end.
To combine work, Git will now use git diff
, just like we would, to find changes. Commit H
has a snapshot, and commit J
has one, and whatever is different between those two commits, well, that's what we did on branch1
:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed
Repeating the diff but with commit L
, the other commit, this time, shows what they (well, we) changed via commits K
and L
:
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed
The merge process—what I like to call merge as a verb—now combines these two sets of changes. The combined changes not only do whatever we did, but also whatever they did. If we touched a file and they didn't, we get our stuff. If they touched a file and we didn't, we get their stuff. If we both touched some file, Git will attempt to combine those changes too.
Git will apply these combined changes to whatever is in the merge base commit. That is, suppose file F
has, say, 100 lines in it, and we change something on line 42 and add a line at line 50, so that file F
now has 101 lines. Suppose they change something on line 99. Git can:
and everything is just fine. Git will consider this the correct result of the merge.2
This process of combining changes and applying the combined changes to the merge base is is, again, what I call merge as a verb. This produces a set of merged files. If there are no conflicts, these merged files are ready to commit.
The merging work actually takes place in Git's index aka staging area, though we won't go into any detail here. If there is a merge conflict, Git leaves all three input files in its index, and writes its best effort at merging into the working tree copy of the file. This working tree copy has merge conflict markers. This causes the merge-as-a-verb process to fail.
For git merge
, if the merge-as-a-verb step succeeds, Git goes on to make a merge commit. A merge commit is almost exactly the same as a regular commit: it has a snapshot, like any commit, and it has a parent, like almost any commit. But it also has a second parent. That's what makes it a merge commit. This uses the word "merge" as an adjective, and Git often refers to these commits as a merge. So this is what I all merge as a noun.
Assuming all went well, we'd get:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
The first parent of merge M
would be commit J
, because that's where branch1
, our HEAD
, was a moment ago. The second parent of merge M
would be commit L
.
If the merge-as-a-verb process fails, git merge
stops in the middle, and leaves the mess for you to clean up. It also exits with a nonzero status, for those running git merge
from a script or program.
2Whether this actually is correct is a separate question, and not one that Git really cares about. Git is just following these simple textual substitution rules.
git cherry-pick
Now that we know how branching, branch names, and git merge
work, we can look at git cherry-pick
. Its function is to copy a commit, by figuring out what the commit does and "doing that again" as it were.
That is, suppose we have a situation like this:
I--J--K <-- feature1
/
...--H
\
L--M--N <-- feature2 (HEAD)
We're working on feature2
right now, and suddenly we notice: Hey, if we had commit J
here after commit N
, we'd be ready to finish. Ideally, we'd get someone to apply commit J
to commit H
—maybe on a new branch—and/or to merge commit J
into something, so that we could use it more directly. But for whatever various reasons, we'd just like to get the change, from I
to J
, into feature2
.
We could run:
git diff <hash-of-I> <hash-of-J>
to see what the change was, and then make the same changes ourselves, to whatever we have in our commit N
, and make a new commit O
. But why should we knock ourselves out doing this copying, when we have a computer that can do it? We run:
git cherry-pick <hash-of-J>
and Git does the copying. If all goes well, it even copies J
's commit message for us, and makes a new commit. This new commit is a lot like J
—diffing N
vs this new commit will show the same changes as diffing I
vs J
—so instead of calling the new commit O
, let's call it J'
:
I--J--K <-- feature1
/
...--H
\
L--M--N--J' <-- feature2 (HEAD)
That's all very nice, but here's what we need to know: The way git cherry-pick
actually works is that it runs the Git merge machinery. It sets commit I
, the parent of J
, as the merge base, and then it runs those two git diff
commands:
Git now combines these two sets of changes, keeping our changes to keep up with commit N
, but adding their changes to get the effect of commit J
too. The fact that commit I
isn't even on our branch is irrelevant. Git uses the merge machinery to make this copy—and usually all works splendidly.
Having run the merge-as-a-verb process, Git goes on to make a regular ordinary single-parent commit. That's our J'
. The commit's author, author-date, and log message get copied from commit J
; we become the committer, and the commit date on the new commit is "now".
But: the merge-as-a-verb process can fail. It can have a merge conflict. This is what you are seeing with your --autosquash
rebase.
We're almost ready to put the pieces together. We just need to know one more thing: git rebase
works by copying commits, as if by using git cherry-pick
. For some versions of git rebase
, Git literally runs git cherry-pick
. The most modern version of Git as of today has the cherry-picking built in to the rebase code, so that it doesn't have to run it separately, but the effect is the same. We can think of it as cherry-picking. Even the fixup and squash cases do this: they just alter the final make-a-new-commit step.
To accomplish a rebase, Git first lists out the commit hash IDs of all the commits that are to be copied. This listing-out process is considerably more complicated than it looks at first, but we get to ignore all the complications here, as none of them actually apply. In your case, you have four commits to worry about, and three of them will be copied, so let's draw that. We'll name the first one A
. It's a root commit: a slightly special case, a commit with no parent. So, here is what you have:
A--B--C--D <-- master (HEAD)
To do the git rebase -i
—whether or not there's any autosquash
going on—Git first lists out each of the commits to copy. Using HEAD^^^
, you tell Git that the commits not to copy start at A
and work backwards. The commits it should copy are those starting from HEAD
(i.e., master
) and working backwards: D
, C
, B
, and A
. From that list, we throw out A
-and-back, leaving D
, C
, and B
.
Normally Git would copy these three in the B-C-D
order. That would work. Git would copy B
to a new and improved commit B'
, then copy C
using B'
as C
's parent, then copy D
using C'
, to produce:
B'-C'-D' <-- master (HEAD)
/
A--B--C--D [abandoned]
Each of these copy steps works as if by using git cherry-pick
, using Git's detached HEAD mode. Git first checks out commit A
using --detach
:
A <-- HEAD
\
B--C--D <-- master
and now runs git cherry-pick
with the hash of commit B
.3 This copies B
to B'
using the merge engine, setting the "merge base" to commit A
. Git compares commit A
, the merge base, to itself because HEAD
says to use A
. This says not to change anything. Then Git compares commit A
(merge base again) to commit B
. This says to make the changes that result in commit B
's snapshot. Git makes the changes that result in commit B
's snapshot, and commits these as a regular (non-merge) commit B'
, re-using most of B
's metadata:
A--B' <-- HEAD
\
B--C--D <-- master
Now Git cherry-picks commit C
. Commit B
is the parent of C
, so it is the forced merge base. It exactly matches our HEAD
commit B'
, so we have no changes to merge in; we pick up their changes and commit, resulting in an exact copy of C
as C'
:
A--B'-C' <-- HEAD
\
B--C--D <-- master
We repeat with D
to get D'
, and then rebase does its last step, which is to yank the name master
off commit D
and paste it onto the last commit just made, and re-attach HEAD
:
A--B'-C'-D' <-- master (HEAD)
\
B--C--D [abandoned]
which is the same picture we drew before, just drawn a little differently.
3The rebase command is actually clever here: it realizes that copying B
here, at commit A
, produces a new commit that's really an exact copy of B
except for the date-and-time-stamps. So instead of copying it, it just re-uses it in place. To defeat the cleverness—which is sometimes useful, on the rare occasion when you need new hash IDs—you can force git rebase
to make copies anyway. For illustration purposes, we'll just pretend that git rebase
is dumber, or that you have defeated the cleverness, but if you delve deep into rebase, know that it does do this.
We can, if we choose, tell git rebase -i
to squash a commit into a previous commit, during this copying process. We just replace the word pick
with the word squash
in the instruction sheet that git rebase -i
gives us to edit. Suppose we did that with commit C
, for instance. Then after copying B
to B'
, so that we have:
A--B' <-- HEAD
\
B--C--D <-- master
Git will do the git cherry-pick
in mostly the same way as before, resulting in a situation in which the next would be C'
as we showed before. But instead of just committing as normal, this commit step takes two special actions:
It writes the commit message from B
(or B'
—they're the same) into a temporary file, and adds the commit message from C
. It adds a little text about this being a squash of two commits, as well. That's what you see in your editor when Git fires up your editor before actually writing out the new commit.
Instead of committing as usual, so that C'
has B'
as its parent, Git instructs the commit process to make the next commit have A
as its parent.
The result at this point is:
B' [abandoned]
/
A--BC <-- HEAD
\
B--C--D <-- master
where BC
has a snapshot that matches commit C
, but the commit message you provide when you edit the file.
Rebase can then go on to cherry-pick D
as usual, and move the branch name as usual. If you can't see the abandoned commits—including the abandoned B'
—then it might as well not exist,4 and you just have:
A--BC--D <-- master (HEAD)
and we don't really need to draw in the other abandoned commits either.
Note that if you use fixup
instead of squash
in your command-sheet, Git still does this squashing process. It just doesn't bother having you edit a new commit message. Instead of gathering together the commit messages from each of the various to-be-squashed commits/copies, it just drops the fixup's message entirely, keeping the previous commit's message. (You can combine fixups and squashes: if you have $S squashes and $F fixups, the combined message you edit will hold all $S messages, and none of the $F messages.)
4Due to rebase cleverness, it might actually not exist. This process works even when rebase just re-uses commit B
directly.
You added --autosquash
. That makes git rebase
automatically move the copying commands around (and then replace some with squash
or fixup
as well). Commit B
remains in place, but commit D
, which is its fixup, moves to just after B
. Commit C
is left at the end. Git is now doing to:
B
normally; thenD
, as a squash-with-fixup, i.e., discard D
's message when we make BD
as a new commit; thenC
normally.So let's look at what we get when we're copying D
. We have:
A--B' <-- HEAD
\
B--C--D <-- master
just as we did before. Now we run git cherry-pick
on commit D
. This uses commit C
as the merge base. We get, as our changes, a diff from C
to B'
.
The diff from C
to B'
says to remove the line line four
from the merge base copy of the file; this line should be the third line. Meanwhile, the diff from C
to D
says to replace the line line four
in the merge base copy of the file, so that it reads line three
instead. In both cases this comes after the line line two
.
In the actual file in commit B'
, there is no line after line 2, which reads line two
. Git doesn't know how to change it from reading line four
to reading line three
, and also to remove it, because it's simply not there. Git does the best it can with this file. It then fails the merge-as-a-verb process, stopping the rebase process in its tracks, and tells you to fix the mess.
If you set merge.conflictStyle
to diff3
,5 your working tree copy of the file will contain not just the two conflicting changes that Git couldn't combine for some reason, but also the merge base version of the lines. In this case, that will only help slightly, but that might be enough. I like to have diff3
set.
Once you fix the conflict—however you choose to fix it—Git takes your result as "the right answer" and makes the new BD
combined commit, using whatever you told Git was the right way that the file should read. So now you have:
B' [abandoned]
/
A--BD <-- HEAD
\
B--C--D <-- master
Git is now supposed to cherry-pick commit C
. This runs a merge with the merge base set to commit B
. Our commit is BD
, so "what we changed" is a diff of the B
copy of the file with whatever you've done. Their commit is C
, so "what they changed" is the diff from B
to C
, which says to add the line "line four" at line 3, between a line that says "line two" (at line 2) and the end of the file.
Unless you make the file end after two lines, with the second line reading "line two", Git is likely to have a problem combining "their" change with yours. So you will see a merge conflict. If you do make the file end there like that, Git will decide that the merge requires nothing at all, which will leave git rebase
a bit puzzled: it will tell you that it seems as though there's no reason to cherry-pick commit C
any more, and force you to choose whether to use git rebase --skip
to skip it.
5Use git config
. To set it for all your repositories that don't already have it set, use git config --global
. I use git config --global merge.conflictStyle diff3
to set it globally.