This might be a noob question.
Suppose I have a Git repo which already have some files in the staged area by using git add. and then I do a git reset --soft @~
I am happy to see some files I committed last time are put into staged area now.
But how? I check .git
folder. the only thing changed are ref of current branch. and one "ORIG_HEAD" which I think is not relevant. the most suspicious index file is not touched at all. and also can anyone tell me how to view the content of it?
So how could git do this? Thanks.
In its simplest form,1 git reset
does two things:
To understand how and why this works and what it does, you need to know how commits work and how the index works, at least at a relatively high level. These are closely tied together anyway.
First, a commit is simply a repository object of type "commit", which has as its data, the commit message and some other information (tree, parents, author, and committer):
$ git cat-file -p 5f95c9f850b19b368c43ae399cc831b17a26a5ac
tree 972825cf23ba10bc49e81289f628e06ad44044ff
parent 9c8ce7397bac108f83d77dfd96786edb28937511
author Junio C Hamano <[email protected]> 1392406504 -0800
committer Junio C Hamano <[email protected]> 1392406504 -0800
Git 1.9.0
Signed-off-by: Junio C Hamano <[email protected]>
This commit is part of the source to git (it's the commit for git version 1.9.0). As with all repository objects, its name is a 40-hex-character SHA-1 value.
The working directory for a commit is determined by the tree
, which is yet another git object, so it has another SHA-1 name. The output from git cat-file -p 972825cf23ba10bc49e81289f628e06ad44044ff
is too long to include entirely but it starts with:
100644 blob 5e98806c6cc246acef5f539ae191710a0c06ad3f .gitattributes
100644 blob b5f9defed37c43b2c6075d7065c8cbae2b1797e1 .gitignore
100644 blob 11057cbcdf4c9f814189bdbf0a17980825da194c .mailmap
100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42 COPYING
040000 tree 47fca99809b19aeac94aed024d64e6e6d759207d Documentation
100755 blob 2b97352dd3b113b46bbd53248315ab91f0a9356b GIT-VERSION-GEN
These blob
entries are all the files (and sub-directories, for each tree
; those have more blob
s) that make up the source to git. Each blob
has a unique SHA-1 ID, based on the contents of the file. The tree
keeps a list of the file's "mode" (really just its x
bit—these modes are all 100644
and 100755
) and file-name along with the SHA-1 name of the blob-object in the repository. (Other modes, like the 040000
seen above, keep track of sub-trees, symbolic links, and submodules. It's only blobs that are restricted to 100644
and 100755
.)
Every git repository object is read-only. The commit whose ID is 5f95c9f...
will never change. It will always have as its (single) tree
the ID 972825c...
. The file whose ID is 536e555...
is always that particular version of the file COPYING
. If the file is updated, a new, different blob with new, different SHA-1 goes in.
Git's "index" (also called the "staging area" and sometimes the "cache") is a poorly-documented file that, in essence, represents "what will go in the next commit".
Unlike repository objects, the index is quite write-able. To make "the next commit" have something different, git adds or removes entries from the index. For instance, to update the file named COPYING
, you would—after editing it—run git add COPYING
. This would take the new contents of the file COPYING
and copy them into the repository (where they will eventually live forever),2 computing an SHA-1 "true name" for the result. This new SHA-1 then goes into the index (along with the mode and the name COPYING
—basically, everything needed to make a commit).
Because the index has everything prepared like this, it's pretty easy to make a new commit. All the correct blob
s are already in the repository. Git only needs to turn the index into some tree
object(s), write those into the repository, get the final SHA-1 of the newest top-level tree
, and write a new commit
object. The new commit will have the following properties:
tree
is whatever gets written based on the indexparent
is whatever is in HEAD
now (more or less—there's some fiddling around with multiple parents when making merge commits)author
and committer
and these dates are taken from the current time and your git configuration user.name
and user.email
, or from arguments (--author
) or environment variables if those are set to override things-m
parameter.So git writes that commit, which produces a new, unique SHA-1. It then writes that SHA-1 itself somewhere.
HEAD
If you're "on branch master
", as git status
would say, that means the file .git/HEAD
contains the literal string ref: refs/heads/master
. This is what git calls an "indirect reference": a reference that just says "go find another reference, here's the name." Usually you are on some branch, and HEAD
is an indirect reference to that branch.
The branch itself can be stored in several different ways, but the simplest is another file in .git
, in this case, the file .git/refs/heads/master
. If that file exists and you read it, it will contain an SHA-1 like 5f95c9f850b19b368c43ae399cc831b17a26a5ac
. That's the current commit, and is how git knows which commit you're "on", just like the ref: refs/heads/master
is how git knows that you're on branch master
.
To make a new commit, git writes the commit as described above, which produces a new unique SHA-1. Then, since you're on branch master
, git simply writes the new commit-ID into .git/refs/heads/master
, and now you're on the new commit, which is the tip of branch master
.
You can also have a "detached HEAD", which—despite sounding like something from the French Revolution—just means that HEAD
is not an indirect reference. Instead, HEAD
contains a raw SHA-1. In this case, to make a new commit, git makes the commit the same way as before, but instead of updating .git/refs/heads/master
, it writes the new commit-ID right into HEAD
.
So, with all that in mind, let's look concretely at what git reset
does.
If you do a --soft
reset, git leaves the index completely untouched. This means it only updates the current branch.
To update the current branch, git does the same thing as when making a new commit: it finds which branch HEAD
indirects to, and writes a new SHA-1 into that reference. If HEAD
points to master
, this only needs to write a new SHA-1 into .git/refs/heads/master
.
The SHA-1 that git writes is the one you supply on the command line:
git reset --soft @~ # @~ means @~1, which means HEAD~1, aka HEAD^
You can see what the SHA-1 will be by running git rev-parse
(for a HEAD
-relative ref, you must do this before the reset
changes HEAD
, of course):
$ git rev-parse @~
9c8ce7397bac108f83d77dfd96786edb28937511
If you tell git reset
to use --mixed
, it also updates the index. The things it puts into the index come from the commit SHA-1 it will write into the branch:
$ git reset --mixed HEAD -- COPYING
Here, by telling it to change the HEAD
to HEAD
, you get reset to move the branch no distance at all from where it used to be, so the branch does not get updated after all; but the -- COPYING
says "extract the SHA-1 for file COPYING
from the target revision HEAD
, and put that SHA-1 into the index for the file COPYING
." So this means that the next commit won't have changes to file COPYING, because we've put the old SHA-1 back into the index.
If you tell git reset
to use --hard
, it also updates the working directory (it's already updating the branch and the index). It does this by getting the actual file (or files) contents out of the repository (looking them up from the unique blob SHA-1s), and overwriting the work-directory version. If you haven't git add
-ed and git commit
-ed those work-directory versions, this means the changes are gone. (If you did git add
, they're in the repository, but if you have not done a git commit
they're eligible for garbage collection—see footnote.)
Since you used --soft
, you suppressed changes to the index, so the only thing git reset
could do is change the contents of the branch tip file, .git/refs/heads/master
.
1git reset
used to have just these three operating modes. It now has --merge
and --keep
, plus --patch
, that do more than the simple cases. It's kind of like the Monty Python skit about the Spanish Inquisition: "Our three modes are soft, mixed, hard, and merge. ... Four! Our four modes are soft, mixed, hard, merge, and keep..."
2Objects in the repository "live forever" with one very large exception: an unreferenced object, one that git fsck
shows as dangling
, is a candidate for garbage collection. Unreferenced blobs, commits, and so on are perfectly normal. They sit around occupying disk space (usually very little: objects are stored compressed) so that you can recover things, and so that they can be collected and discarded all at once later if and when git thinks it's a good idea to clean up.
Objects are "referenced" (and therefore live forever) when some external label—a branch name, a tag, HEAD
, or whatever—points to them directly or indirectly. A branch name points to the tip-most commit on that branch. That commit points to its tree, which points to any sub-trees and blobs, so all of those remain forever; and that commit points to its parent commit(s), so those parents remain forever. Each parent commit points in turn to its own parents, and those also remain forever.
A commit becomes un-referenced when you move the branch label away from it:
A <- B <- C <-- HEAD=master
Here master
(our current branch) points to C
, C
to B
, and B
to A
. But if we:
$ git reset --hard HEAD^
we make master
point to B
, which points to A
. Commit C
is now unreferenced: it has been abandoned, and eventually it will be garbage-collected, along with its tree and any sub-trees and blobs. Similar events occur with, e.g., git commit --amend
, which does a soft-reset-and-new-commit, making a new commit D
that points to B
, and having master
point to D
:
A - B - D <-- HEAD=master
\
C [abandoned]
The rebase
operation copies and then abandons entire sequences of commits, generating a lot of candidate objects for garbage-collection. This is why dangling objects are normal.