Search code examples
gitgit-pushgit-pullgit-fetch

Are git pull and push repository wide operations or branch specific?


Trying to get a real handle on git : ) Is git pull a repository wide operation? Meaning, does it update your local branches (which are tracking remote branches) across the repository, or does it only fetch and merge for the currently checkedout branch?

Is the same true for push? What does --all do for push and pull?

Any help would rock!

Also, what does fetch do? Does it grab the info (files inside of the .git folder) for a specific branch? Or is the .git folder consistent across the whole repo? If I do fetch instead of clone, I can't really do anything after that, what do I do after fetching?


Solution

  • TL;DR summary: "it depends".

    The answer is "both and neither", really. Or "it depends". Or something like that!

    First, there are two basic operations to consider: fetch and push. (The pull operation is just a shell script built on top of fetch, so once you know how that works, we can explain pull properly.)

    Both fetch and push have access to entire repositories. But in general, they do not work by sending entire repositories over the wire (or other communications channel). They work based on references.

    The fetch and push operations generally take "refspecs", which are reference-pairs (remote:local and local:remote respectively) plus an optional "force" flag prefix +. However, they can be given just a simple reference, and the force flag can be specified with -f or --force.

    Both commands have been around for a long time and have accumulated a lot of "old stuff". The "modern" way to work with remote repositories is through the thing called a "remote", using git remote add to create them (and git clone creates one called origin by default). These turn into entries in the .git/config file:

    [remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*
        url = ssh://...
    

    The url = line gives the URL for both fetch and push—though there can be an extra pushurl = line if needed, to make pushes go somewhere else. (There are "old ways" to run fetch and push and supply URLs directly, and so on, but let's just ignore all of them ... remotes are much better!) This also supplies refspecs—well, one refspec, in this case—for git fetch.

    git ls-remote

    With that out of the way, let's start with another command entirely, git ls-remote. This works like a fetch but without actually fetching anything:

    $ git ls-remote origin
    676699a0e0cdfd97521f3524c763222f1c30a094    HEAD
    222c4dd303570d096f0346c3cd1dff6ea2c84f83    refs/heads/branch
    676699a0e0cdfd97521f3524c763222f1c30a094    refs/heads/master
    d41117433d7b4431a188c0eddec878646bf399c3    refs/tags/tag-foo
    

    This tells us that the remote named origin has three ref-names. Two are branches and one is a tag. (The special HEAD ref has the same SHA-1 as refs/heads/master, so git will guess that the remote is "on branch master" as git status might say. There's a bug of sorts in the remote protocol: git should be able to say "HEAD is a symbolic ref, pointing to refs/heads/master", so that your end does not have to guess. This would fix the case of two branches having the same SHA-1 as HEAD.)

    git fetch

    When you run git fetch origin, the fetch operation starts out with the same ls-remote, more or less, and thus sees all the branches and tags. If you use --tags it brings over all the tags too, otherwise it does something fairly complicated1 that brings over all the branches and some tags. It sees all other references as well, but by default, it does not bring those over: for instance, the remote might have refs/notes/commits, which is used by git notes, but that one does not come over.

    When you alter the refspecs given to git fetch, though, you change what gets brought over. The default is the one right there in .git/config, fetch = +refs/heads/*:refs/remotes/origin/*. This refspec says to bring over all refs/heads/* references—all branches—and store them locally under refs/remotes/origin/ using the same name as the branch-name on the remote. Using --tags adds one additional refspec: refs/tags/*:refs/tags/*. That's how git brings over all their tags: everything matching refs/tags/*, which is all the tags, goes into your local refs/tags/ under the matching name.

    (You can add more fetch = lines and bring more stuff over. See this answer on "remote tags" for an example.)

    Now, just bringing over the reference name won't do much good unless git also brings over any required underlying objects,2 as identified by their SHA-1s. Let's say you already have 676699a..., but not 222c4dd.... (You're up to date on master but not on branch. Maybe you don't even have branch branch yet.) The fetch operation needs to bring over that commit for sure. That commit probably needs various files, and previous commits, and so on. So your git fetch communicates with the thing on the remote that's looking at the other git repository, and they have a little conversation, where each one tells the other what SHA-1s they have now, and which ones they still need. If yours needs 222c4dd..., it asks the other end "what else do I need to use 222c4dd...", checks to see if it has those, adds them to its list if not, checks those in more detail once added, and so on.

    Having finally agreed on what to exchange, their git sends you the objects—usually in a "thin pack" if possible (the details depend on the transport)—and your git unpacks and/or repacks them as needed, and then updates your local references for any new branches, tags, or other references brought over. (By default, your git just stores their branches in your "remote branches"—your copy of "what they had the last time I talked with them"—but updates your tags. That is, there are no "remote tags", just "remote branches".)

    An important git fetch special case

    As a special case, if you give git fetch any arguments beyond the name of the remote—as in:

    git fetch origin master
    

    for instance—these refspecs override the ones in the config file, and (in git versions predating 1.8.4) prevent updating "remote branches". This generally limits what's fetched, sometimes quite a bit. (In 1.8.4 and later, they still limit the fetch, but the remote-branch gets updated anyway, which makes more sense.) Here, a refspec that is missing a colon—like the one above—is not treated as if it had the same name on both sides. Instead, "their" branch is collected up as usual, but the SHA-1 and branch name are written into .git/FETCH_HEAD.

    (There is a very good reason for this: if git fetch origin master updated your master, you would lose all the new commits you made! So you want it to update only origin/master and/or FETCH_HEAD.)

    git push

    The push operation is really very similar to fetch. It's not completely symmetric though: you don't push to a "remote branch", in general, you just push right to a "branch". For instance, when pushing your branch master, your local reference is refs/heads/master, and their local reference is also refs/heads/master. It's not refs/remotes/yoursystem/master, for sure. So the refspecs used for push are often quite a bit simpler.

    If you just run git push (or git push origin), this still needs to come up with some refspec(s), though.

    There is a (sort of new) control knob in the git config file, push.default, that allows you to configure which references git pushes. In current versions of git, it defaults to matching. In git 2.0 it is slated to change to simple. There are five total possible settings:

    • nothing: produce an error
    • current: push the branch you are on to the same name
    • upstream: push the branch you are on to its upstream name
    • simple: like upstream, but require that the upstream name match the local name
    • matching: push all branches that have the same name

    Some of these require a bit of further explanation. The "upstream name" is the branch name on the other end. Let's say you have a remote branch named origin/feature, and you made a local tracking branch for it, but called it feature2 because you were already working on a different feature branch (not yet created on origin). So your local feature2 has remote/origin as its upstream (and your feature has no upstream at all). Pushing to upstream will follow the mapping, and push your feature2 to their feature. Pushing with simple will reject the attempt.

    Hence, if you git push with no refspec, git will look up the default configuration3 and construct a refspec based on that. For the matching case, it pushes every branch that you and they both have (so, if you both have master and branch, push your master to their master, and your branch to their branch), but does not do anything about branches only one of you has.

    If you give some explicit refspec(s), all of this becomes moot: the push operation pushes the refspecs you give it. Moreover, a refspec without a colon means "use the same name on both ends", so master is a short-hand way to write the full long version, refs/heads/master:refs/heads/master.

    As with a fetch, your git and their git communicate to figure out what repository objects, if any, need to be sent over to accomplish the push.

    git pull

    The git pull operation runs the four-word form of git fetch.

    Its first step is to figure out what remote to use. If you name one:

    git pull origin master
    

    it takes the name you give it; otherwise it looks at which branch you're on (let's say master), then looks in .git/config to find branch.master.remote (probably origin).

    Then, it figures out what branch to use. If you name one, it uses that; otherwise, it uses branch.master.merge, which is the name of the branch at the other end (normally just master again). It then runs git fetch with those arguments.

    This means the fetch will bring over only the "interesting" branch, in this case master, and put the SHA-1 in FETCH_HEAD. (If you have git 1.8.4 or newer, it will also update origin/master.)

    Finally, pull runs either merge or rebase, depending again on configuration entries and whether you run it with --rebase. The commit you will merge, or rebase to, is the one whose SHA-1 is now stored in FETCH_HEAD.

    Note that this only merges or rebases your current branch.


    1As noted in the manual, fetch defaults to a "tag following" trick: it looks at the SHA-1s in the tags, and sees if those are or will be in your repository. For those that are-or-will-be, it brings over that tag. You can turn this off with --no-tags.

    2Objects are the things that the repository actually stores: "blobs" (files), trees (directories full of files or more directories), commits, and "annotated tags". Each has a unique SHA-1 name.

    3However, you can override this with a per-branch configuration, branch.name.pushremote and remote.name.push. You can make a twisty mass of difficult-to-understand effects by turning a lot of configuration knobs.