I've been reading about git pull
and fetch
commands and the difference between them.
I agree there is a difference between the 2 commands when we have master branches both locally and remotely and therefore pull will integrate whatever changes we fetched.
But what if new branches have been pushed to the remote that had never been fetched before. If we only use git fetch
vs git pull
what will be the difference internally from Git point of view after we have fetched/pulled those branches ? are the new branches not integrated if we only run git fetch
?
I wanted to test it and did the following:
I have a remote repository which I cloned twice, let's call those local repos repo 1
and repo 2
- repo 1
will create new branches and push them to remote and repo 2
will pull/fetch them from remote.
I created and pushed a new branch - side_branch_1
- to the remote repo from repo 1
. Then I got back to repo 2
and used git pull
. Then I ran git branch -a
and saw the new branch as remotes/origin/side_branch_1
. I also opened the .git/FETCH_HEAD
file and saw the line for that branch: <sha-1> not-for-merge branch side_branch_1 of <url>
.
After that, in repo 1
I created and pushed side_branch_2
and in repo 2
I used git fetch
this time. Then I ran again git branch -a
and saw the new branch as remotes/origin/side_branch_2
. I also opened the .git/FETCH_HEAD
file again and saw the line for that branch: <sha-1> not-for-merge branch side_branch_2 of <url>
.
Is there no difference for new branches whether I pull
or fetch
? And if yes then what is the difference from Git internal point of view ?
Because side_branch_1
is tagged as not-for-merge
even though it has been pulled. Why ? What am I missing ?
git pull
means run git fetch
, then run a second Git command. The first step—git fetch
—does not affect any of your branches. It does not change anything you're working on, if you're working on anything.
The second step, which defaults to running git merge
, affects your current branch. It does not create a new branch, so in general, any new branch names created in the other Git are not relevant unless you explicitly named them on your git pull
command.
Assuming you run git pull
with no extra arguments, the remote on which git pull
runs git fetch
is the remote associated with the current branch, and the commit that is used for rebase-or-merge is that associated with the upstream of the current branch as updated by the git fetch
step. Git imposes limitations on the upstream setting for a branch name in your repository: in particular, if your Git is not yet aware that some name exists in the other Git, your Git won't let you set it as the upstream. So "new" branches—which we haven't properly defined, really—are not relevant.
If you add more arguments to your git pull
command line, the picture gets more complicated.
Is there no difference for new branches whether I pull or fetch?
Git pull always means: run git fetch
, then run a second Git command. So obviously these are different because git fetch
does not run a second Git command. It is irrelevant here whether or not the fetch step sees branch names that your Git has not seen before.
And if yes then what is the difference from Git internal point of view?
Here's where you need to be closely aware of how Git really works. To keep this answer short(ish), I'll say see a lot of my other answers for lots of detail, but:
git log
shows you: commit 1c56d6f57adebf2a0ac910ca62a940dc7820bb68
for instance.Each commit stores a snapshot of all of your files. The files inside each commit are in a special, read-only, Git-only, compressed format, frozen for all time.
Each commit also stores some metadata: information about the commit that isn't a file saved with the commit, but rather, holds stuff like who made the commit, when, and why (their log message). In this metadata, each commit stores the hash ID of its immediate parent commit (for most commits; some store two or more parents, and these are merge commits, and at least one will be the very first commit in the repository and therefore won't have a parent).
A branch name like master
simply holds the raw hash ID of the last commit in the chain. Hence if you have a branch named master
and some commits, master
holds some hash ID H
, and commit H
points back to some earlier commit G
, which points back to a yet-earlier commit F
, and so on:
... <-F <-G <-H <--master
To add a commit to a branch, we select that branch name, which selects that commit. That takes the frozen, Git-only files out of the commit into an area where we can work on them. We work on them as desired and eventually tell Git: make a new commit. Git makes the new commit point back to the one we got out, saving a new snapshot of all of our files, and then, having made the new commit, changes the branch name so that it points to the new commit:
...--F--G--H--I <-- master
Branch names are not the only kind of names that can remember commit hash IDs. More than one name can identify any single commit, too.
The git clone
command works by calling up another Git repository. You tell your system:
git clone
to).git init
.origin
(or whatever other name you tell Git to use): git remote add
.git clone
command.origin
—at the stored URL—and have it list out its branch (and other) names and their raw hash IDs. Then, ask that Git for the commits ... in this case, all of them. Copy all of its commits over into our otherwise-empty repository. Take its branch names and rename them: make its master
become our origin/master
, for instance, and make its develop
become our origin/develop
, and so on.master
—use the renamed origin/
version of the name to make a branch name, and point that branch name at the same commit as my origin/
version of the name.So after the initial git clone
, you have remote-tracking names, usually of the form origin/*
, for each of the other Git's branch names. You then have one branch name of your own, usually master
, pointing to the same commit as your origin/master
. If they have master
and develop
, perhaps you now have:
...--G--H <-- master, origin/master
\
I--J <-- origin/develop
Step 5, in the six-step git clone
sequence above, is in fact git fetch
. However, rather than obtain every commit, what git fetch
does is talk with the other Git, to see which commits they have that you don't. During the initial clone, you don't have any commits, so that's just automatically all of theirs. Later, it's their new ones.
When you run git fetch
later, if they still have their master
identifying commit H
and their develop
identifying commit J
, your Git will look in your repository, using the real hash IDs that H
and J
stand in for, and see that you already have them. Your Git does not need to get any new commits. If they've added another commit to their develop
, though, they will have new commit K
and you'll get it:
...--G--H <-- master, origin/master
\
I--J <-- origin/develop
\
K
and then your git fetch
will update your remote-tracking name origin/develop
to point to commit K
:
...--G--H <-- master, origin/master
\
I--J--K <-- origin/develop
If they do something unusual and force their develop
back one step and you run git fetch
again, you will keep commit K
for a while—typically at least 30 days by default—but your Git will adjust your origin/develop
to match their develop
:
...--G--H <-- master, origin/master
\
I--J <-- origin/develop
\
K [no name: hard to find!]
Git in general finds commits by starting from some name—whether it's your branch name, or your remote-tracking name, or any other name—and then working backwards.
(There are hidden logs of previously-stored hash IDs for each name, by which you can find K
. The entries in these logs eventually expire, and that's where the 30-day limit comes from: after 30 days, the entry retaining K
expires. Some time after that, Git's garbage collector, git gc
, will throw K
out for real, if nobody has made a new name to protect it.)
Running git fetch
like this, with no name at all—defaulting to origin
, usually—or with just the name of the remote such as origin
, will—as long as you haven't set things up specially—obtain all of the branch names from the other Git, and create or update all of your remote-tracking names accordingly. However, setting up something called a single-branch clone configures your Git differently, so that git fetch
only updates a single remote-tracking name. You can reconfigure this later, or override the set of names to update using a refspec, but we won't go into further detail here.
git fetch
; let's start using a branch nameAgain, Git's fetch
is the part that obtains new commits from the other Git. Having obtained new commits, if there were some to obtain, git fetch
adjusts your remote-tracking names. It has no effect on any of your branch names. Your branch names are all undisturbed.
If you never have any branch names of your own—which would be weird, though it is possible to do this—and never do any work on your own, which is less weird and sensible for certain applications (archival storage, for instance), that would suffice. But you probably do use branches.
Let's say you make your own branch name, dave
or whatever you like. Let's say you make this name point to existing commit H
:
...--G--H <-- dave, master, origin/master
\
I--J--K <-- origin/develop
Now that you have more than one branch name, we'd like to have Git remember which one you're actually using. We'll attach the special name HEAD
to one of them:
...--G--H <-- dave (HEAD), master, origin/master
\
I--J--K <-- origin/develop
So now we can tell that you're using the name dave
and commit H
. Three names, dave
and master
and origin/master
, all identify commit H
right now.
We mentioned above that the files saved in commits are in a special, read-only, Git-only, compressed and frozen format that only Git can use. So Git has copied these files out, into both Git's index and a work area for you. The work area is your working tree or work-tree. It has ordinary files stored in your computer's ordinary format.
You make new commits—usually anyway—by manipulating these ordinary files, then using git add
to copy them back into Git's index. This re-compresses the file into the frozen format, ready to go into a new commit. When you run git commit
, Git will package up the files that are in its index at that time. Hence we can say that the main function of the index is to store what you propose to put into your next commit. (It has other functions as well but we won't get into them here.)
Eventually you have your files in shape, and git add
-ed, and you run git commit
. Git collects the appropriate metadata and writes out a new commit, which assigns the new commit its unique hash ID. Git then stores the new commit's hash ID into the current branch name, giving us:
L <-- dave (HEAD)
/
...--G--H <-- master, origin/master
\
I--J--K <-- origin/develop
You could equally well work on master
, or develop
that starts out pointing to commit K
, or whatever, but one way or another, you make a new commit, and it points back to whatever commit you told Git to use to start with.
Now, if you run git fetch
and they, whoever they are, made or otherwise acquired new commits you have not yet seen, these new commits have been added on to their branches. Your Git sees them in their repository, sees that you do not have them yet, and gets them. Let's draw one (and stop drawing I-J-K
as they're in the way, but the letters are used up so I'll go with M
here next):
L <-- dave (HEAD)
/
...--G--H <-- master
\
M <-- origin/master
You might like to incorporate their new commit somehow.
Exactly how you incorporate their new commit is up to you. You could, for instance:
git checkout master
and then git merge origin/master
git merge origin/master
right now while on commit L
on branch dave
or do any number of other things.
If you:
git checkout master; git merge origin/master
though, your Git will do what Git calls a fast-forward merge. This is not a merge at all—it's somewhat poorly named—but it has this effect:
L <-- dave
/
...--G--H--M <-- master (HEAD), origin/master
In fact, if you run git checkout master; git rebase origin/master
, the same thing happens in this particular case. In other cases, different things may happen.
git pull
comes inAs a rule, once you've brought new commits over from some other Git with git fetch
, you tend to want to do something with them. If you're on your master
and they have updated their master
, the thing you might want to do is update your master
. The two most common ways to do that are to run either git merge
or git rebase
.
The git pull
command can be told to run either of those as its second command. The default is for it to run git merge
. Both git merge
and git rebase
operate on the current branch. That is, they look at the special name HEAD
. As long as that is attached to some branch name—as it normally is—that is the branch name of yours that they will affect. They make changes to Git's index and to your work-tree; both may change which commit is selected by the current branch name; git merge
may make a new merge commit, or perform a fast-forward operation, or sometimes, do nothing.
One of the parts I don't like about git pull
is that you do not always know, when you hit Enter, exactly what commits git fetch
will end up fetching, and where it may move any remote-tracking names. But you're dead set on running git merge
or git rebase
using those new commits and updated names. (This is technically off a bit, as we'll see—it doesn't use updated origin/*
names directly—but it's close enough here.)
Even if the new commits aren't something you want to use to affect your current branch, you're going to have this happen. You can't tell if it will happen. You could use some viewer to inspect the other Git repository first, but what happens if you view it, and then just before you press Enter, someone else changes things in that other repository? Still, people like this a lot, and use it all the time, so let's get to your detailed questions.
I also opened the
.git/FETCH_HEAD
file again and saw the line for that branch:<sha-1> not-for-merge branch side_branch_2 of <url>.
Here's the historical secret (or not so secret) about git fetch
and git pull
: they are so old that git pull
itself existed before remote-tracking names like origin/master
did. Remotes and remote-tracking names were invented some time between Git version 1.4 and 1.5, and there was some fumbling around with different ideas. The git pull
command kept working the way people wanted it to, all throughout these transitional times as the newfangled remotes and remote-tracking names were being developed.
To avoid having to change too much code too often, and/or because remotes and remote-tracking names didn't exist yet, git fetch
has always written everything into .git/FETCH_HEAD
. To let the early git pull
scripts figure out which commit hash ID to give to git merge
, git fetch
notes which one of our branch names we're using now—that's the "where is HEAD attached" check—and what name(s) to use from the other Git. It then marks each .git/FETCH_HEAD
line with not-for-merge
, or doesn't mark it, depending on the arguments you gave to git fetch
.
When you run git pull
, you can give a bunch of arguments to the git pull
command:
git pull # no arguments at all
git pull origin # just a remote
git pull origin master # a remote and a branch name *on the remote*
Back when git pull
literally ran git fetch
, it passed these arguments on to git fetch
. It now has git fetch
built into it, but it still works the same. If you give one or more branch names here, that is, or those are, the ones that git fetch
doesn't mark as not-for-merge
in the .git/FETCH_HEAD
file.
Similarly, when git pull
was still a shell script—it was rewritten in C relatively recently—this is how git pull
decided which hash ID to pass to git merge
or, if you choose git rebase
as your second command, to git rebase
. What it does now is more obscure. Since the fetch part is now built in as C-coded function calls, it can just retain the raw hash IDs in memory.
In Git version 1.8.4, the Git folks decided that git fetch origin master
should update origin/master
. Before that, git fetch origin
would update all remote-tracking names, but git fetch origin master
would update none. From Git 1.8.4 onward, git fetch origin master
updates origin/master
. It still does not update other remote-tracking origin/*
names, because it does not bring over commits corresponding to any updated names. (It could still update the remote-tracking names in some cases, but it just doesn't.)
The git fetch
that git pull
runs:
git pull xyzzy one two three
runs git fetch xyzzy one two three
. "Mostly" is only here because some options affect which second command to use, and/or are eaten by git pull
itself, and/or are passed to the second command instead of being passed to git fetch
..git/FETCH_HEAD
in case you are still using the old git pull
shell scripts.In general, git fetch
is safe to run at any time. (You can configure it to be unsafe, if you really wish, by setting remote.name.fetch
inappropriately or passing an unsafe refspec argument. It's worth noting, though, that git fetch
has built-in safety checks even if you do this. The old pull
script turns them off!)
The subsequent git merge
or git rebase
operates on the current branch and it tends to not be a good idea to let these happen if you have uncommitted work. Git will normally detect such a case, and prevent the second command from running at all for these cases. In the distant past, though, the pull command could (and did) wreck in-progress work irrecoverably, because git pull
—the old script, anyway—turned off a lot of safety-checks.
In any case, the second command—the merge-or-rebase step—gets a bunch of extra arguments that made it work the same during the Git 1.4 to 1.6 transitional period when remotes and remote-tracking names were changing. That was almost 15 years ago now, but it still works the same way. If you use:
git fetch
git merge
and your Git makes a merge commit, the default merge message will be something like:
merge branch origin/dave into dave
but if you use:
git pull
the default merge message will be more like:
merge branch dave of <url> into dave
The "something like" is because the exact spelling of each message here depends on the branch names (obviously), and whether you're merging into master
—this omits the into <branch>
part—and there are some quote marks that get inserted that I didn't want to bother with here. :-)