While Git rebasing, and fixing the conflict, the file doesn't appear to be modified anymore

I am trying to do a git rebase to master. I have 28 rebases. So, on some stages, I get conflicts. I make the adjustments, then I do git status, and the modified files appear. However, when I do git add {filename}, sometimes the files disappear from the modified and the changes to be committed list.

Is it because of some git bugs or because I have unintentionally made the code to be same as the master branch?

Solution

Is [the disappearing status] ... because I have unintentionally made the code to be same as the master branch?

Probably—although "unintentionally" could be wrong; maybe you made it that way on purpose, without realizing that this was your purpose. It's not quite right to say "the same as the master branch", though. As j6t said in a comment, it means that the file is now identical to the HEAD commit.

Before we get to details, let me go back to this:

However, when I do git add {filename}, sometimes the files disappear from the modified and the changes to be committed list.

Let's take a look at what git status actually does. First, let's define the work tree, the index, and both a commit generally, and specifically the HEAD commit. Then, let's look at what a Git diff is. Then we can get to git status and look at the process of git rebase.

For this purpose, remember that a file tree (or just tree) is a collection of files, starting with a top level directory (or "folder" if you prefer that term), which may contain additional sub-directories ("sub-folders") as well as containing files. The tree is the top level directory with all its contents: all its own files, plus any sub-trees and their files, and any sub-sub-trees and so on.

The work tree, the index, commits, and `HEAD`

Your work tree is just that: the tree (directory) where you do your work. It has all your files in the normal formats that your editor and the rest of your computer can work with. (It can also have files that do not participate in Git: these are called untracked files. If you build source into object code, or turn Python into byte-compiled *.pyc files, for instance, those are kept as work-tree-only, i.e., untracked, on purpose.)

The index—which is also called the staging area, and sometimes the cache—is simply where you build the next commit. Using git add <path> copies the given <path> from the work-tree into the index, replacing the version of the file that was there before. When you eventually run git commit, Git turns whatever is in the index—which includes any subdirectories and their files, as well as all the top-level files—into a new commit.¹

Commits are the main reason Git exists at all. Each commit stores one tree. That tree is a snapshot of what you had in your index when you made the commit. Each commit also stores some metadata. I won't define this term fully here, but instead just use the example of the actual metadata for each commit. These are:

The tree itself. (The snapshot is a separate entity from the commit. We don't really need to care about that here, but it can matter later, and we might as well describe in properly.)
A list of parent commit IDs, usually just one ID. This is the commit that was in place just before you made the new commit.
An author: a name, an email address, and a time-stamp. This is the person who wrote the new code, or new text, or whatever it is that is "new" about this commit, as compared to the commit that was in place just before now.
A committer: the same idea as an author, just a second person in case the person who wrote the new commit is not the person who ran git commit. This happens with emailed patches, for instance.
A log message. This is free-form text, meant for whoever makes the commit to provide a good description of why they made this commit.

Because each commit stores the ID of the commit that came right before it, a series or chain of commits lets us view the history of the development:

A <- B <- C   <-- master

Here commit C is the latest on master. (Its actual ID is some big ugly SHA-1 hash, badf00daddc0ffee... or whatever.) Commit C has the hash ID of commit B, which lets Git find commit B, and B has the ID of A. The name master is how Git finds commit C.

There is always a HEAD commit.² This is your current commit. Normally, this is also the tip of some branch: for instance, normally you might be on branch master, as git status would say, and then HEAD would resolve to commit C. But you can have HEAD point to some other commit, and in this case, HEAD is just "the current commit".

Making a new commit turns the index into a snapshot (tree) and makes the new commit using that tree. The parent of the new commit is the old HEAD, and then Git updates HEAD so that it points to the new commit. If you're on a branch, Git does this updating by making the branch name point to the new commit:

A <- B <- C <- D   <-- master (HEAD)

If you're not on a branch, then HEAD actually contains a raw commit ID. In this case, git commit writes the new commit ID directly into HEAD. (This is what happens during your conflicted git rebase, which is why I mention it.) But in any case, see how commit D here points back to commit C: the new snapshot always refers back to the previous one.

Again, the HEAD commit is always the current commit. We'll need this in a moment, when we get into the rebase action.

¹This isn't quite precise. The index is what you get if you recursively flatten a tree. This makes it easy(ish) to turn the index into a tree—so this is what Git does here: it turns the index into a tree, using git write-tree. This gets Git one of those big ugly SHA-1 hash IDs. Git then uses this hash ID for the new commit. By copying the index to a tree, then putting the tree ID in a commit, Git winds up saving the index's contents as the new commit's snapshot.

²There is one exception to this rule. This exception is required by the fact that an initial, empty repository has no commits. Clearly, if there are no commits, it's impossible to resolve HEAD to a commit hash ID. For our purposes, though, we don't need to care about this special case of an "orphan" or "unborn" branch.

`git diff`, and two vs three trees

While git diff has a lot of options and usage patterns, the simplest and most straightforward is to compare two trees. One tree is labeled a and the other is b. The diff itself consists of a set of instructions, which mostly amount to things like: "To change a/README.txt to b/README.txt, remove the 12th line that's there now, and insert this other line for line 12. Here is some context around line 12 as well." This means that the file in question is named README.txt and is at the top level of the tree—if it were in some sub-tree, the output would say a/subdir/README.txt and b/subdir/README.txt, for instance.

One of the two trees is often your work-tree. You can also use the index as if it were a tree. Or, you can use any commit—such as the HEAD (current) commit—as a tree; Git simply finds the snapshotted tree that goes with that commit.

Rather than getting a set of instructions, "here's how to change README.txt", "here's how to change main.py", and so on, we often just want a list of file names. We can get this from git diff using --name-only or --name-status. The --name-only flag tells it to print only the name: README.txt or main.py. Using --name-status adds a status as well: M for modified, A for newly added, and so on.

Note that given any ordinary snapshot commit, with one parent commit, we can git diff that commit against its (single) parent. This will show us what changed in that commit. That's what git show and git log -p do: they print some information about the commit, then run git diff against the commit's parent.

In any case, though, git diff only compares two trees at a time.³ But here you are, just about ready to run git commit, and you have, in effect, three trees:

your HEAD (current) commit;
your index; and
your work-tree.

It would be nice to be able to compare all three. Enter git status.

³Actually, git diff can compare more than two trees, producing what it calls a combined diff. The git show command does this for merge commits (git log -p normally just skips over them, diff-wise). But this is tricky, and more importantly, does not do what we want for git status.

`git status`

What git status does is to run two git diffs. Each one gets a slight variant of --name-status applied.

The first diff is HEAD vs index. This diff, between the current commit and your index, are "changes to be committed". Remember that git commit will write the index to the new commit. If we did that now—if we turned the current index into a new commit—and then viewed that commit as compared to the current commit, we'd see just what git log -p or git show would show. These would be our committed changes. So that's what this part of git status shows.

It doesn't print the actual diff, just file names and a verbose status (e.g., modified instead of just M). If we want the actual diff, we must run git diff --cached. This—which uses the old "cache" name for the index—compares HEAD vs the index.

Having shown us that, git status now runs a second git diff. This compares the index vs the work-tree. If there are files we have not yet git add-ed, this will show us which files those are. Again, we don't see the actual diff, just the file names and status. If we want the actual diff, we must run git diff, which compares index vs work-tree. Since these are changes we have not yet git add-ed, this second --name-status style diff from git status shows what we could git add. Once we do git add them, they will be in the index, so this diff from git status will stop mentioning the file.

What if we change something, then change it back?

Note that in all this process, we're still getting two separate diffs: HEAD-vs-index, and index-vs-work-tree. What if we go straight to HEAD-vs-work-tree?

Well, git status won't do that, but we can: we can run git diff HEAD (without --cached this time). As always, we can use --name-status to get just file name and status, or leave it out to get a full diff.

Now, let's say that git status says that README.txt has changes to be committed, and that README.txt has changes not staged for commit. This means HEAD-vs-index is different, and index-vs-work-tree is different. But what if the first change—HEAD vs index—is, say:

-the color purple
+the colour purple

(i.e., we went to British spelling). And what if the second change, from index to work-tree, is:

-the colour purple
+the color purple

(i.e., we changed back to American spelling). If we compare HEAD vs work-tree, using git diff HEAD, we won't see any changes at all!

If, at this point, we git add README.txt, we'll go from having "changes to be committed" and "changes not staged for commit" to having no changes. This is what you are seeing.

Rebase is repeated cherry-pick

The git rebase command is very much like repeating a lot of individual git cherry-pick commands. Remember those graphs we drew above, with three or four commits on master. Let's draw a bigger graph, with a side branch:

...--D--E--F       <-- master
         \
          G--H--I--J--K   <-- sidebr

Note that master points to commit F, while sidebr points to commit K. There are five commits on sidebr that are not on master. (Commits E and earlier are on both sidebr and master. This is a bit peculiar to Git.) To rebase sidebr onto master, we need to have Git copy each of these five commits.

The Git command that copies one commit is git cherry-pick. The way it copies the one commit is to turn it into a diff, by comparing it to its parent commit, then applying that diff to the place you would like it copied-to. We want to copy G and have the copy come just after F, like this:

             G'  <-- HEAD
            /
...--D--E--F       <-- master
         \
          G--H--I--J--K   <-- sidebr

The new copy—the new commit—is "like G but slightly different", so we call it G'. Once we have G', we next want to copy H, and have the new copy come after G':

             G'-H'  <-- HEAD
            /
...--D--E--F       <-- master
         \
          G--H--I--J--K   <-- sidebr

We want to repeat this sequence until we have copied K to K':

             G'-H'-I'-J'-K'  <-- HEAD
            /
...--D--E--F       <-- master
         \
          G--H--I--J--K   <-- sidebr

Once they are all copied, the last thing we want—the last step for git rebase—is to move the branch label sidebr to point to the last commit we copied, abandoning the old chain:

             G'-H'-I'-J'-K'  <-- sidebr (HEAD)
            /
...--D--E--F       <-- master
         \
          G--H--I--J--K   [abandoned]

Now, during all this cherry-picking, it's possible that something in one of the commits—or even in many of them—is already done in commit F. In that case, since we're applying changes derived from scanning the old chain, to a snapshot derived by starting from F, we'll hit cases where the cherry-picked commit does not apply properly.

Resolving the conflict may result in removing the change: it's not needed as a change because it's already in the new base. In this case, we'll stop having any change from HEAD—the last commit we successfully copied—to our index.

If we wind up removing all the changes from one of these commits, we'll have what Git likes to call an "empty" commit. (These aren't actually empty, they are just the same as the previous commit. It's not the commit that's empty, it's the git log -p patch that's empty.) Git by default won't make empty commits, so for these cases, we have to use git rebase --skip instead of git rebase --continue. Git tries to figure out, ahead of time, if there will be such "empty copies", and if so, to skip them in advance. But sometimes it can't figure that out—we only find out that skipping is right when we get there and resolve a conflict.

I always find it a bit suspicious: did I really resolve this correctly? The change really is in the new base? It's worth looking over the git log results from the new base, to make sure you did resolve the conflict correctly. But it can be correct; it may be intentional after all.

While Git rebasing, and fixing the conflict, the file doesn't appear to be modified anymore

The work tree, the index, commits, and HEAD

git diff, and two vs three trees

git status

What if we change something, then change it back?

Rebase is repeated cherry-pick

The work tree, the index, commits, and `HEAD`

`git diff`, and two vs three trees

`git status`