I am trying to do a git rebase
to master. I have 28
rebases. So, on some stages, I get conflicts. I make the adjustments, then I do git status
, and the modified files appear. However, when I do git add {filename}
, sometimes the files disappear from the modified
and the changes to be committed
list.
Is it because of some git
bugs or because I have unintentionally made the code to be same as the master
branch?
Is [the disappearing status] ... because I have unintentionally made the code to be same as the
master
branch?
Probably—although "unintentionally" could be wrong; maybe you made it that way on purpose, without realizing that this was your purpose. It's not quite right to say "the same as the master
branch", though. As j6t said in a comment, it means that the file is now identical to the HEAD
commit.
Before we get to details, let me go back to this:
However, when I do
git add {filename}
, sometimes the files disappear from themodified
and thechanges to be committed
list.
Let's take a look at what git status
actually does. First, let's define the work tree, the index, and both a commit generally, and specifically the HEAD commit. Then, let's look at what a Git diff is. Then we can get to git status
and look at the process of git rebase
.
For this purpose, remember that a file tree (or just tree) is a collection of files, starting with a top level directory (or "folder" if you prefer that term), which may contain additional sub-directories ("sub-folders") as well as containing files. The tree is the top level directory with all its contents: all its own files, plus any sub-trees and their files, and any sub-sub-trees and so on.
HEAD
Your work tree is just that: the tree (directory) where you do your work. It has all your files in the normal formats that your editor and the rest of your computer can work with. (It can also have files that do not participate in Git: these are called untracked files. If you build source into object code, or turn Python into byte-compiled *.pyc
files, for instance, those are kept as work-tree-only, i.e., untracked, on purpose.)
The index—which is also called the staging area, and sometimes the cache—is simply where you build the next commit. Using git add <path>
copies the given <path>
from the work-tree into the index, replacing the version of the file that was there before. When you eventually run git commit
, Git turns whatever is in the index—which includes any subdirectories and their files, as well as all the top-level files—into a new commit.1
Commits are the main reason Git exists at all. Each commit stores one tree. That tree is a snapshot of what you had in your index when you made the commit. Each commit also stores some metadata. I won't define this term fully here, but instead just use the example of the actual metadata for each commit. These are:
git commit
. This happens with emailed patches, for instance.Because each commit stores the ID of the commit that came right before it, a series or chain of commits lets us view the history of the development:
A <- B <- C <-- master
Here commit C
is the latest on master
. (Its actual ID is some big ugly SHA-1 hash, badf00daddc0ffee...
or whatever.) Commit C
has the hash ID of commit B
, which lets Git find commit B
, and B
has the ID of A
. The name master
is how Git finds commit C
.
There is always a HEAD
commit.2 This is your current commit. Normally, this is also the tip of some branch: for instance, normally you might be on branch master
, as git status
would say, and then HEAD
would resolve to commit C
. But you can have HEAD
point to some other commit, and in this case, HEAD
is just "the current commit".
Making a new commit turns the index into a snapshot (tree) and makes the new commit using that tree. The parent of the new commit is the old HEAD
, and then Git updates HEAD
so that it points to the new commit. If you're on a branch, Git does this updating by making the branch name point to the new commit:
A <- B <- C <- D <-- master (HEAD)
If you're not on a branch, then HEAD
actually contains a raw commit ID. In this case, git commit
writes the new commit ID directly into HEAD
. (This is what happens during your conflicted git rebase
, which is why I mention it.) But in any case, see how commit D
here points back to commit C
: the new snapshot always refers back to the previous one.
Again, the HEAD
commit is always the current commit. We'll need this in a moment, when we get into the rebase action.
1This isn't quite precise. The index is what you get if you recursively flatten a tree. This makes it easy(ish) to turn the index into a tree—so this is what Git does here: it turns the index into a tree, using git write-tree
. This gets Git one of those big ugly SHA-1 hash IDs. Git then uses this hash ID for the new commit. By copying the index to a tree, then putting the tree ID in a commit, Git winds up saving the index's contents as the new commit's snapshot.
2There is one exception to this rule. This exception is required by the fact that an initial, empty repository has no commits. Clearly, if there are no commits, it's impossible to resolve HEAD
to a commit hash ID. For our purposes, though, we don't need to care about this special case of an "orphan" or "unborn" branch.
git diff
, and two vs three treesWhile git diff
has a lot of options and usage patterns, the simplest and most straightforward is to compare two trees. One tree is labeled a
and the other is b
. The diff itself consists of a set of instructions, which mostly amount to things like: "To change a/README.txt
to b/README.txt
, remove the 12th line that's there now, and insert this other line for line 12. Here is some context around line 12 as well." This means that the file in question is named README.txt
and is at the top level of the tree—if it were in some sub-tree, the output would say a/subdir/README.txt
and b/subdir/README.txt
, for instance.
One of the two trees is often your work-tree. You can also use the index as if it were a tree. Or, you can use any commit—such as the HEAD (current) commit—as a tree; Git simply finds the snapshotted tree that goes with that commit.
Rather than getting a set of instructions, "here's how to change README.txt", "here's how to change main.py", and so on, we often just want a list of file names. We can get this from git diff
using --name-only
or --name-status
. The --name-only
flag tells it to print only the name: README.txt
or main.py
. Using --name-status
adds a status as well: M
for modified, A
for newly added, and so on.
Note that given any ordinary snapshot commit, with one parent commit, we can git diff
that commit against its (single) parent. This will show us what changed in that commit. That's what git show
and git log -p
do: they print some information about the commit, then run git diff
against the commit's parent.
In any case, though, git diff
only compares two trees at a time.3 But here you are, just about ready to run git commit
, and you have, in effect, three trees:
It would be nice to be able to compare all three. Enter git status
.
3Actually, git diff
can compare more than two trees, producing what it calls a combined diff. The git show
command does this for merge commits (git log -p
normally just skips over them, diff-wise). But this is tricky, and more importantly, does not do what we want for git status
.
git status
What git status
does is to run two git diff
s. Each one gets a slight variant of --name-status
applied.
The first diff is HEAD
vs index. This diff, between the current commit and your index, are "changes to be committed". Remember that git commit
will write the index to the new commit. If we did that now—if we turned the current index into a new commit—and then viewed that commit as compared to the current commit, we'd see just what git log -p
or git show
would show. These would be our committed changes. So that's what this part of git status
shows.
It doesn't print the actual diff, just file names and a verbose status (e.g., modified
instead of just M
). If we want the actual diff, we must run git diff --cached
. This—which uses the old "cache" name for the index—compares HEAD
vs the index.
Having shown us that, git status
now runs a second git diff
. This compares the index vs the work-tree. If there are files we have not yet git add
-ed, this will show us which files those are. Again, we don't see the actual diff, just the file names and status. If we want the actual diff, we must run git diff
, which compares index vs work-tree. Since these are changes we have not yet git add
-ed, this second --name-status
style diff from git status
shows what we could git add
. Once we do git add
them, they will be in the index, so this diff from git status
will stop mentioning the file.
Note that in all this process, we're still getting two separate diffs: HEAD
-vs-index, and index-vs-work-tree. What if we go straight to HEAD
-vs-work-tree?
Well, git status
won't do that, but we can: we can run git diff HEAD
(without --cached
this time). As always, we can use --name-status
to get just file name and status, or leave it out to get a full diff.
Now, let's say that git status
says that README.txt
has changes to be committed, and that README.txt
has changes not staged for commit. This means HEAD
-vs-index is different, and index-vs-work-tree is different. But what if the first change—HEAD
vs index—is, say:
-the color purple
+the colour purple
(i.e., we went to British spelling). And what if the second change, from index to work-tree, is:
-the colour purple
+the color purple
(i.e., we changed back to American spelling). If we compare HEAD
vs work-tree, using git diff HEAD
, we won't see any changes at all!
If, at this point, we git add README.txt
, we'll go from having "changes to be committed" and "changes not staged for commit" to having no changes. This is what you are seeing.
The git rebase
command is very much like repeating a lot of individual git cherry-pick
commands. Remember those graphs we drew above, with three or four commits on master
. Let's draw a bigger graph, with a side branch:
...--D--E--F <-- master
\
G--H--I--J--K <-- sidebr
Note that master
points to commit F
, while sidebr
points to commit K
. There are five commits on sidebr
that are not on master
. (Commits E
and earlier are on both sidebr
and master
. This is a bit peculiar to Git.) To rebase sidebr
onto master
, we need to have Git copy each of these five commits.
The Git command that copies one commit is git cherry-pick
. The way it copies the one commit is to turn it into a diff, by comparing it to its parent commit, then applying that diff to the place you would like it copied-to. We want to copy G
and have the copy come just after F
, like this:
G' <-- HEAD
/
...--D--E--F <-- master
\
G--H--I--J--K <-- sidebr
The new copy—the new commit—is "like G
but slightly different", so we call it G'
. Once we have G'
, we next want to copy H
, and have the new copy come after G'
:
G'-H' <-- HEAD
/
...--D--E--F <-- master
\
G--H--I--J--K <-- sidebr
We want to repeat this sequence until we have copied K
to K'
:
G'-H'-I'-J'-K' <-- HEAD
/
...--D--E--F <-- master
\
G--H--I--J--K <-- sidebr
Once they are all copied, the last thing we want—the last step for git rebase
—is to move the branch label sidebr
to point to the last commit we copied, abandoning the old chain:
G'-H'-I'-J'-K' <-- sidebr (HEAD)
/
...--D--E--F <-- master
\
G--H--I--J--K [abandoned]
Now, during all this cherry-picking, it's possible that something in one of the commits—or even in many of them—is already done in commit F
. In that case, since we're applying changes derived from scanning the old chain, to a snapshot derived by starting from F
, we'll hit cases where the cherry-picked commit does not apply properly.
Resolving the conflict may result in removing the change: it's not needed as a change because it's already in the new base. In this case, we'll stop having any change from HEAD
—the last commit we successfully copied—to our index.
If we wind up removing all the changes from one of these commits, we'll have what Git likes to call an "empty" commit. (These aren't actually empty, they are just the same as the previous commit. It's not the commit that's empty, it's the git log -p
patch that's empty.) Git by default won't make empty commits, so for these cases, we have to use git rebase --skip
instead of git rebase --continue
. Git tries to figure out, ahead of time, if there will be such "empty copies", and if so, to skip them in advance. But sometimes it can't figure that out—we only find out that skipping is right when we get there and resolve a conflict.
I always find it a bit suspicious: did I really resolve this correctly? The change really is in the new base? It's worth looking over the git log
results from the new base, to make sure you did resolve the conflict correctly. But it can be correct; it may be intentional after all.