I am working with a local repository and a repository on my server. The server branch should have the name "master" and the local branch I would like to name "local".
With git branch -a
I listed all my branches:
* master
remotes/origin/HEAD -> origin/master
remotes/origin/master
I guess that * master
here is the local repository I am actually working at the moment.
What I do not understand is, what is origin/HEAD ?
and why is it pointing to origin/master
?
Is origin/master
the same as master
? I am confused.
So as a result I think I would need only this:
* local
remotes/origin/master
I guess that
* master
here is the local repository I am actually working at the moment.
Yes, but that's not the whole story.
What I do not understand is, what is
origin/HEAD
? and why is it pointing toorigin/master
?
As knittl said, you can usually just ignore it. What it is, is a symbolic reference. What that is, well, we'll come back to that.
One of the keys to understanding Git is to grok the idea that "everything is local". What that means is that—aside from some rare but important exceptions like when you run git fetch
or git push
—Git doesn't think about "other repositories" at all. There is only your repository. Everything is in your repository! That's all that exists, so everything has to be there.
To make that work, we start with the repository. At its heart, what a Git repository is, is a pair of databases. One database holds commit objects and other supporting objects (commits themselves are truly tiny and need other stuff to make them useful). The second database holds names, which we need because the keys used in the first database are so ugly.
Both databases are simple key-value stores. The objects database, holding commits and supporting objects, uses hash IDs as the keys. Given the hash ID, Git can quickly retrieve the object; when the object is a commit object (containing the commit's metadata), Git can use that to retrieve a full snapshot of all files. But the hash IDs—which Git needs: it can't retrieve the object without its hash ID—are too hard for human brains and fingers. So the names database holds names—branch names, tag names, and many other kinds of names—and maps those names to the hash IDs that Git needs.
Once you understand this, and that master
is short for the name refs/heads/master
which is specifically a branch name because it starts with refs/heads/
, you can see how Git takes the name master
, puts the refs/heads/
back on the front, finds the name in the names database, uses that to retrieve the correct commit hash ID, and uses that hash ID to retrieve the commit's content (metadata and snapshot).
That's almost the whole story. The rest is much longer, but is really just the filling-in of details.
The interesting names—by which I mean, the names that aren't ones that Git uses internally, such as replacement names or bisection temporary names, or refs/stash
for git stash
, or whatever—fall into some major groups:
refs/heads/
.refs/tags/
.refs/remotes/
.Almost all names start with refs/
. There are some funny ones, like HEAD
, MERGE_HEAD
, ORIG_HEAD
, CHERRY_PICK_HEAD
, and the like that don't, but they tend to contain HEAD
(in all uppercase like this). Git calls the names references, and is not always consistent about whether the funny ones are references (sometimes they are "pseudo-refs"). Most refs contain one (1) raw hash ID.
Sometimes, some refs contain the name of another ref. Git calls this a symbolic reference (or symref or similar for short). The only one of these that "always works right" in general, in all versions of Git, is HEAD
. Long ago (back in the days of Git 1.7 perhaps) I made a test repository with a "branch" named INDIR
containing the name of another branch. Asking Git to delete INDIR resulted in Git deleting the other branch! This is long since fixed, but it shows that symbolic refs are tricky.
When HEAD
is a symref—which it usually is—it contains the branch that you're on. In fact, that's precisely how Git knows which branch you're on: the special name HEAD
contains the branch name. So when you see:
* master
next
in your git branch
output for instance, this just means that HEAD
contains the name master
, so that you're on your master
branch.
1Git calls them "remote-tracking branch names", but the word "branch" here is a bad idea in my opinion. In some error messages and documentation, Git calls them "remote branch names", which is even worse (again in my opinion). The phrase remote-tracking name isn't great either, but it's better than these two alternatives.
The ability to be "on" a branch name—using git switch
or git checkout
to select that branch name as the current branch name, which is then recorded in the special name HEAD
—already makes a branch name slightly special. You can't pull this same trick with a tag or remote-tracking name, for instance:
git switch v2.1.0
gives you an error, as does:
git switch origin/master
(assuming there is an origin/master
name—if not, you get an error there too, but it's a different error):
fatal: a branch is expected, got tag 'v2.1.0'
or:
fatal: a branch is expected, got remote branch 'origin/master'
You can't be "on" a tag or a remote-tracking name. That's because they are not branch names. You can only be "on" a branch (name), or in "detached HEAD" mode, using git switch --detach
.
Ignoring the case of detached-HEAD mode—which is not the way you normally want to work2—you are, then, always "on" some branch name, which means you need at least one branch name in your repository. You can have as many branch names as you want, or as few. Each one holds a hash ID—as every (non-symbolic) name does—so that name helps you, and Git, find that particular commit.
Note that you can have many names for one commit! That's perfectly fine: if you want version 1.0 to be version 1.0.0 as well, just make two tag names for that commit. You can do this with branch names too: given a commit hash ID H
(where H
stands in for the real hash ID), you can make two, or ten, or a million, branch names, all of which store hash ID H
.
But let's get back to the really special property of a branch name. The act of checking out a commit, whether by name or directly by hash ID, whether "on a branch" or in "detached HEAD" mode, extracts all the files from that commit, so that you can use them. But if you change some files in your working tree and git add
the updated files and run git commit
, you get a new commit, with a new unique hash ID—and if you're "on" some branch, Git will store that new hash ID in the branch name. That is, Git will read HEAD
, and if it's a symbolic ref to some branch name, Git will stuff the new commit hash ID into the names database, under that branch name.
So the special property of a branch name is that it contains the hash ID of the most recent commit that is "on" that branch. This is actually the definition. It's not an accidental consequence of "new commit causes Git to update current branch name": the "new commit makes Git update" is, instead, a consequence of the definition of the special property.
You can, at any time, force any particular (existing) hash ID to go into any branch name. When you do that, the commit hash ID you've forced into that branch name is the last commit that is on that branch. You can, at any time, create a new branch name by selecting some commit hash ID and having Git create a new database entry using that ID, and when you do that, the commit you selected is the last commit that is on that branch.
There are usually some (or many) earlier commits also on that branch, but you don't specify them. They're specified in the commits. You pick the last commit and that comes with earlier commits. In the end, it's the commits that matter. You just find the most recent one, quickly and easily, using a branch name. The branch names move and change, over time, so that you can find particular commits quickly and easily.
Hence, you want to answer, to yourself, this question: Which commits do you want to find quickly and easily? Those are ones for which you might consider creating branch names.
2The git rebase
command works in detached-HEAD mode until the rebase is complete. You may also want to use git switch --detach
to look at, or use, a historical commit by its raw hash ID, without having to bother to invent a name for it. If you make new commits in this mode, though, you'll eventually run into problems, so except for the special case of rebasing, you really don't want to work in this mode.
The fact that this mode exists is why Git doesn't actually need any branch names at all. The fact that working in this mode will make you miserable is why you really want to have at least one branch name.
git push
Now, I noted above that Git normally thinks of itself as the king of all repositories because this repository—the one you're using right now—is the only repository.3 But sometimes, every now and then, we'd like to connect this repository to some other repository, using git fetch
if we want to get their new commits from them, or git push
if we want to send our new commits to them.
Sending our commits is somewhat straightforward. All Git repositories ultimately find, and identify, commits by hash ID, so we just present the raw hash IDs of our latest commit(s) to the other Git software. If they have those commits already, they will know, because hash IDs are unique but also universal. If not, they will know, because hash IDs are unique: they either have something with that hash ID, in which case, they have that commit, or they don't, in which case they don't have that commit. This unique-yet-universal property is the deep magic that propels the distributed part of Git as a DVCS.
So, we send them our new commits, which in general should add on to their existing commits. Then we ask them to create a new branch name, or update some existing branch name, so that they can remember, in their repository, the latest commit. That's what git push
is all about: we send them commits, then we ask them to create or update some of their branch names.4 For convenience, Git lets us do this as:
git push <remote> <branch>
where the remote
part is typically the string literal origin
, which is the standard first (and often only) remote, which we haven't defined here, but which provides a URL by which our Git software can call up their Git software, whoever "they" may be, and ask them to stuff these commits into their repository and update their branch.
The branch
argument here specifies both the commit (or commits) to send from our repository, starting (or ending?) with the last commit on our branch named branch
, and the branch name we'd like their Git software to update. This way if we have a branch named feature/tall
, we can easily ask them to create or update a branch in their repository that is also named feature/tall
.
Note the key concept here: we have our branch names, and they have their branch names. We often—perhaps even almost all the time—want to use the same name in our repository that they're using in their repository, and vice versa. We do this because:
git push origin feature/tall
syntax; and/orIt's possible to use different names on each "side", so that we have the name feature/tall
while they have the name bob/loves/lisa
or something equally silly. (The syntax for this is git push origin feature/tall:bob/loves/lisa
.) Because we can use Git without any branch names at all, we can even use a raw commit hash ID on our side. But we have to use a name on their side: that's part of the defined Git protocol. It does not have to be a branch name—it can be a tag name, or some other kind of name—but it has to be a name, and they have to allow it, all of which is up to them.
3This was originally reflected in the Git source code as a bunch of global variables holding the data for the (as in, one, single, the only) repository. As Git evolved, the code improved to the point where there's now a data structure representing a repository, so that Git code could, in theory, work with more than one repository at a time—but there's still a global variable holding the repository.
Note that Git used to have the same issue with the working tree. Adding more working trees, via git worktree add
, which was new in Git 2.5, meant gathering all those global variables up into a data structure, and creating multiple instances of the work-tree structure. There was more to it than just that, but that was a lot of the work. As there were some nasty bugs that were not fixed until Git 2.15, it's clear that this change was tricky.
4We can ask them to create or update a tag name, or indeed, any kind of name at all, and we can ask them to delete some of their names. A single git push
can request to send as many, or as few, objects as we like, and it can send as many, or as few, create/update/delete requests for ref-names as we like. But most of the time, we mostly use git push
with one branch name.
git fetch
As with the sending process above, Git concedes (slightly) that other repositories exist, for the purpose of getting commits from them. To do this we use git fetch
. Note that git clone
, which copies an entire repository full of commits, also uses git fetch
to get those commits: clone is just a wrapper that creates a new, empty repository, sets it up so that git fetch
can run, runs git fetch
, and then creates one branch and checks it out.
When we do run git fetch
, our Git software can see all of their names: all their branch names, all their tag names, and even all of their other names.5 In the early days of Git—before "remotes" were invented—these names just got stashed in a file you could look at later if you wanted, but people very quickly realized that these names were immensely valuable.
This happens because humans are human. We give names to branches because we believe that their most recent commits are valuable. We choose the names to reflect something about this value, e.g., feature/tall
as a name tells us that this is a new feature we're working on to make the repository grow taller.6
So, if some other repository—one that we call origin
—has a branch named master
, that indicates that they, whoever they are, think there's something important named master
. This might be important to us! Or, it might not.
Our Git software will, as a default and matter of course, create or update a remote-tracking name in our repository. This remote-tracking name will identify the same commit that their master
identifies. Every time we run git fetch
,7 our Git will check their master
and update our remote-tracking name as appropriate.
Our Git software will build the name that our repository will use—the remote-tracking name—by starting with refs/remotes/
, which is required for all remote-tracking names, then adding the name of the remote itself, which is almost always origin
here. Then our Git software adds one more slash, and then it copies the (shortened) branch name. So if they have refs/heads/master
—the branch named master
—we get refs/remotes/origin/master
.
To put it more simply, though less precisely, our Git software just takes each of their branch names and renames them by sticking origin/
in front. These become our remote-tracking names. Our Git repository gets all of their commits, and all the ones they think are important—as indicated by having a branch name that selects that commit—get marked up in our repository using our remote-tracking names.
5Technically, we see only those names they permit us to see. Git has some configuration items to hide various refs or namespaces. Servers may use this for various purposes, for instance. It's not meant as a real security mechanism, because git fetch
can request objects by raw hash IDs directly, so even if someone can't see some name, they might be able to find a hash ID and get the object that way. As a Git user, or even as a security officer, you should think of anything that's in any Git repository as being fully exposed to anyone who can see any part of that repository. Access security in Git tends to be an all-or-nothing deal, and that's by Linus Torvald's design: if they can't load any part of the repository, they can't get anything, but if they can load any part of the repository, they can load all of the repository.
6Whatever that means. I think it refers to how, if one just has enough will power, one can get taller, just as we use will power to gain or lose weight. 😀
7You can run git fetch
in a limited mode, where it's unable to update all remote-tracking names. Or, you can run it in its more normal, unlimited mode: git fetch origin
or even just git fetch
. In the unlimited mode, this will update all your remote-tracking names. There are a bunch of exceptions here though—not just the limited mode—and older versions of Git—pre-1.8.4—are particularly bad about them.
Note that there's a thing I consider a sort of a bug here. Suppose their Git repository has a branch named xyzzy
, but only briefly. If you run git fetch origin
while they have xyzzy
, your Git sees their xyzzy
and creates your own origin/xyzzy
. That's all fine. But then they delete their xyzzy
and you run git fetch
again. This time they have no branch named xyzzy
. OK, so what? Well, so: your Git software fails to delete your origin/xyzzy
. Your origin/xyzzy
represents a branch name that used to exist, but no longer exists.
Is this bad? No, not really, it's so very minor; but yes, it's awful—it's terrible! It all depends on your point of view, and also on how often and how many of these "stale" remote-tracking names you accumulate. If "their" repository creates 10000 of these a minute and you pick them all up, you have millions of them by the next day (1.44 million, to be precise). They clutter things up.
You can instruct your Git software to delete these "stale" remote-tracking names using git fetch --prune
or git remote prune
, or you can configure fetch.prune
to true
. I think this probably should be the default, myself (so I have it configured to true
).
With
git branch -a
I listed all my branches:* master remotes/origin/HEAD -> origin/master remotes/origin/master
What you have here is three names, but they're not all branch names. The git branch -a
command lists both branch names and remote-tracking names.
If you use git branch
with no options, you get a listing of your branch names, with refs/heads/
removed, so that you would just see master
.
If you use git branch -r
to view remote-tracking names, you get a listing of your remote-tracking names, with refs/remotes/
removed, so that you would see origin/HEAD
and origin/master
.
For some reason—nobody seems to know why—when you use git branch -a
to list both, the section that lists remote-tracking names removes only refs/
instead of refs/remotes
, so that you see the three you showed above.
The middle one of these three is a symbolic reference, which only really works right for HEAD
. When your Git software calls up their Git software with git fetch
, they will show not only their regular refs, but also the HEAD
pseudo-ref, and your Git can use that to figure out which branch they are "on". The Git developers seem to have had a more detailed plan for this at one point, but for now, aside from git clone
time, there seems to be little practical use for this information. That's why you can just ignore it.8
The origin/main
name is your own local name, but it reflects that your Git software saw a branch named main
in their Git's listing. Every time you run git fetch
to origin
(though see footnote 7) your Git will update this.
8The git remote
command has some sub-commands designed to manipulate this. There are some ways you could use this, but I never do: I find little value here, and I worry that at some point, the Git developers might resurrect whatever plan they had that would conflict with anything I wanted to do with it. So I just ignore it myself.
What git clone
does with this information is straightforward. When you run git clone url
, you're allowed to specify a branch name with the -b
option. If you do specify such a name, that's the name your Git will try to create and check out, after running git fetch
. If you don't, though—and most people mostly don't, it seems—your Git uses their HEAD
to figure out which name they recommend. Git calls that the default branch and hosting sites like GitHub typically let you set this (but Google do not let you set the default branch, which wasn't important until the Great Renaming from master to main, and now it's kind of a problem).
Is
origin/master
the same asmaster
? I am confused.
origin/master
is a remote-tracking name, representing their (some other Git repository that you call origin
) master
branch name. This name makes it easy for you to find a specific commit.
master
is a branch name, making it easy for you to find a specific commit.
Right after git clone
, both names represent the same commit. Since branch names move when you move them, you can change yours; since their branch name will move when they move it, they can change theirs; if either of you change the hash ID stored in either name, they will stop representing the same name.
... the local branch I would like to name "local".
So create a branch named local
, or rename the name master
to local
.
There's one caveat about renaming the existing master
, as there's something about branch names I did not cover above. Each branch name can have one (1) upstream set. Or it can have no upstream at all. The upstream setting of a branch name is supposed to gain you some convenience. When and whether you find this convenient is up to you: some people don't bother at all.
The existing master
will have its upstream set to origin/master
. If you create a new branch name local
, you can choose whether it has any upstream setting, and if so, what that is. If you rename the existing master
it will retain its current upstream setting—but you can, if you like, then change or remove the upstream setting.
For more about upstreams, see Why do I need to do `--set-upstream` all the time? and Why call git branch --unset-upstream to fixup? (and other questions mentioning upstream).