Trying to get a real handle on git : ) Is git pull a repository wide operation? Meaning, does it update your local branches (which are tracking remote branches) across the repository, or does it only fetch and merge for the currently checkedout branch?
Is the same true for push? What does --all do for push and pull?
Any help would rock!
Also, what does fetch do? Does it grab the info (files inside of the .git folder) for a specific branch? Or is the .git folder consistent across the whole repo? If I do fetch instead of clone, I can't really do anything after that, what do I do after fetching?
The answer is "both and neither", really. Or "it depends". Or something like that!
First, there are two basic operations to consider: fetch
and push
. (The pull
operation is just a shell script built on top of fetch
, so once you know how that works, we can explain pull
properly.)
Both fetch
and push
have access to entire repositories. But in general, they do not work by sending entire repositories over the wire (or other communications channel). They work based on references.
The fetch and push operations generally take "refspecs", which are reference-pairs (remote:local and local:remote respectively) plus an optional "force" flag prefix +
. However, they can be given just a simple reference, and the force flag can be specified with -f
or --force
.
Both commands have been around for a long time and have accumulated a lot of "old stuff". The "modern" way to work with remote repositories is through the thing called a "remote", using git remote add
to create them (and git clone
creates one called origin
by default). These turn into entries in the .git/config
file:
[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
url = ssh://...
The url =
line gives the URL for both fetch and push—though there can be an extra pushurl =
line if needed, to make pushes go somewhere else. (There are "old ways" to run fetch and push and supply URLs directly, and so on, but let's just ignore all of them ... remotes are much better!) This also supplies refspecs—well, one refspec, in this case—for git fetch
.
With that out of the way, let's start with another command entirely, git ls-remote
. This works like a fetch
but without actually fetching anything:
$ git ls-remote origin
676699a0e0cdfd97521f3524c763222f1c30a094 HEAD
222c4dd303570d096f0346c3cd1dff6ea2c84f83 refs/heads/branch
676699a0e0cdfd97521f3524c763222f1c30a094 refs/heads/master
d41117433d7b4431a188c0eddec878646bf399c3 refs/tags/tag-foo
This tells us that the remote named origin
has three ref-names. Two are branches and one is a tag. (The special HEAD
ref has the same SHA-1 as refs/heads/master
, so git will guess that the remote is "on branch master
" as git status
might say. There's a bug of sorts in the remote protocol: git should be able to say "HEAD
is a symbolic ref, pointing to refs/heads/master
", so that your end does not have to guess. This would fix the case of two branches having the same SHA-1 as HEAD
.)
When you run git fetch origin
, the fetch operation starts out with the same ls-remote
, more or less, and thus sees all the branches and tags. If you use --tags
it brings over all the tags too, otherwise it does something fairly complicated1 that brings over all the branches and some tags. It sees all other references as well, but by default, it does not bring those over: for instance, the remote might have refs/notes/commits
, which is used by git notes
, but that one does not come over.
When you alter the refspecs given to git fetch
, though, you change what gets brought over. The default is the one right there in .git/config
, fetch = +refs/heads/*:refs/remotes/origin/*
. This refspec says to bring over all refs/heads/*
references—all branches—and store them locally under refs/remotes/origin/
using the same name as the branch-name on the remote. Using --tags
adds one additional refspec: refs/tags/*:refs/tags/*
. That's how git brings over all their tags: everything matching refs/tags/*
, which is all the tags, goes into your local refs/tags/
under the matching name.
(You can add more fetch =
lines and bring more stuff over. See this answer on "remote tags" for an example.)
Now, just bringing over the reference name won't do much good unless git also brings over any required underlying objects,2 as identified by their SHA-1s. Let's say you already have 676699a...
, but not 222c4dd...
. (You're up to date on master
but not on branch
. Maybe you don't even have branch branch
yet.) The fetch operation needs to bring over that commit for sure. That commit probably needs various files, and previous commits, and so on. So your git fetch
communicates with the thing on the remote that's looking at the other git repository, and they have a little conversation, where each one tells the other what SHA-1s they have now, and which ones they still need. If yours needs 222c4dd...
, it asks the other end "what else do I need to use 222c4dd...
", checks to see if it has those, adds them to its list if not, checks those in more detail once added, and so on.
Having finally agreed on what to exchange, their git sends you the objects—usually in a "thin pack" if possible (the details depend on the transport)—and your git unpacks and/or repacks them as needed, and then updates your local references for any new branches, tags, or other references brought over. (By default, your git just stores their branches in your "remote branches"—your copy of "what they had the last time I talked with them"—but updates your tags. That is, there are no "remote tags", just "remote branches".)
As a special case, if you give git fetch
any arguments beyond the name of the remote—as in:
git fetch origin master
for instance—these refspecs override the ones in the config file, and (in git versions predating 1.8.4) prevent updating "remote branches". This generally limits what's fetched, sometimes quite a bit. (In 1.8.4 and later, they still limit the fetch, but the remote-branch gets updated anyway, which makes more sense.) Here, a refspec that is missing a colon—like the one above—is not treated as if it had the same name on both sides. Instead, "their" branch is collected up as usual, but the SHA-1 and branch name are written into .git/FETCH_HEAD
.
(There is a very good reason for this: if git fetch origin master
updated your master
, you would lose all the new commits you made! So you want it to update only origin/master
and/or FETCH_HEAD
.)
The push
operation is really very similar to fetch
. It's not completely symmetric though: you don't push to a "remote branch", in general, you just push right to a "branch". For instance, when pushing your branch master
, your local reference is refs/heads/master
, and their local reference is also refs/heads/master
. It's not refs/remotes/yoursystem/master
, for sure. So the refspecs used for push are often quite a bit simpler.
If you just run git push
(or git push origin
), this still needs to come up with some refspec(s), though.
There is a (sort of new) control knob in the git config file, push.default
, that allows you to configure which references git pushes. In current versions of git, it defaults to matching
. In git 2.0 it is slated to change to simple
. There are five total possible settings:
nothing
: produce an errorcurrent
: push the branch you are on to the same nameupstream
: push the branch you are on to its upstream namesimple
: like upstream, but require that the upstream name match the local namematching
: push all branches that have the same nameSome of these require a bit of further explanation. The "upstream name" is the branch name on the other end. Let's say you have a remote branch named origin/feature
, and you made a local tracking branch for it, but called it feature2
because you were already working on a different feature
branch (not yet created on origin
). So your local feature2
has remote/origin
as its upstream (and your feature
has no upstream at all). Pushing to upstream
will follow the mapping, and push your feature2
to their feature
. Pushing with simple
will reject the attempt.
Hence, if you git push
with no refspec, git will look up the default configuration3 and construct a refspec based on that. For the matching
case, it pushes every branch that you and they both have (so, if you both have master
and branch
, push your master
to their master
, and your branch
to their branch
), but does not do anything about branches only one of you has.
If you give some explicit refspec(s), all of this becomes moot: the push operation pushes the refspecs you give it. Moreover, a refspec without a colon means "use the same name on both ends", so master
is a short-hand way to write the full long version, refs/heads/master:refs/heads/master
.
As with a fetch, your git and their git communicate to figure out what repository objects, if any, need to be sent over to accomplish the push.
The git pull
operation runs the four-word form of git fetch
.
Its first step is to figure out what remote to use. If you name one:
git pull origin master
it takes the name you give it; otherwise it looks at which branch you're on (let's say master
), then looks in .git/config
to find branch.master.remote
(probably origin
).
Then, it figures out what branch to use. If you name one, it uses that; otherwise, it uses branch.master.merge
, which is the name of the branch at the other end (normally just master
again). It then runs git fetch
with those arguments.
This means the fetch will bring over only the "interesting" branch, in this case master
, and put the SHA-1 in FETCH_HEAD
. (If you have git 1.8.4 or newer, it will also update origin/master
.)
Finally, pull
runs either merge
or rebase
, depending again on configuration entries and whether you run it with --rebase
. The commit you will merge, or rebase to, is the one whose SHA-1 is now stored in FETCH_HEAD
.
Note that this only merges or rebases your current branch.
1As noted in the manual, fetch defaults to a "tag following" trick: it looks at the SHA-1s in the tags, and sees if those are or will be in your repository. For those that are-or-will-be, it brings over that tag. You can turn this off with --no-tags
.
2Objects are the things that the repository actually stores: "blobs" (files), trees (directories full of files or more directories), commits, and "annotated tags". Each has a unique SHA-1 name.
3However, you can override this with a per-branch configuration, branch.name.pushremote
and remote.name.push
. You can make a twisty mass of difficult-to-understand effects by turning a lot of configuration knobs.