Search code examples
gitupstream-branch

Fetch upstream from another repo and then push the changes in my local branch


I have two projects one is a mirror from another and I have a branch in the no mirror project that I need to move to the mirror project.

I'm doing the next:

git remote add upstream https://github.com/my/nomirrorProject.git
git fetch upstream upstreamBranch:mylocalbranch

But I'm getting the next error message:

fatal: Refusing to fetch into current branch refs/heads/myLocalBranch of non-bare repository

git push origin mylocalbranch

Any ideas?

Thanks!


Solution

  • TL;DR

    Unless you know exactly what you are doing, do not use the git fetch upstream upstreamBranch:mylocalbranch syntax. Similarly, do not use git fetch origin theirbranch:mybranch. Instead, use git fetch upstream followed by one of:

    • git checkout or git switch, or
    • git merge, or
    • git rebase

    depending on your intended goal.

    What you're doing

    Git is all about commits. Git isn't really about branches, though branch names help you (and Git) find commits; Git is about the commits. Git isn't really about files either, though each commit contains files. This means you need to know, first, what a commit is and does for you, and, second, that a Git repository consists of several database plus some extra stuff to make them useful for you. The first (and usually biggest by far) database holds the commits and other objects.

    Commits

    Commits, in Git:

    • Are numbered. Every Git commit has a globally (across every Git repository ever, even if it's not related to your Git repository) unique ID, which Git calls a hash ID or an object ID (OID). This is how two Git repositories, when they meet on the street (or on the net), decide whether they have a commit in common, or not: by comparing these IDs. These hash IDs are very large and ugly; they look random to humans, although they're not random at all; and humans basically never use them directly (that would drive us crazy).

    • Hold snapshots and metadata:

      • Each commit has a full snapshot of every file—or more precisely, but it sounds redundant, every file that it has. The redundant-sounding phrase takes care of the fact that some commits have new files added, and some subsequent commits may have files removed. Each commit, once made, is frozen for all time, so its saved files are available forever.

        The files inside the commits are stored in a special, read-only, Git-only format in which they're compressed and de-duplicated. So the fact that one archive (commit) mostly re-uses files from some previous commit means that these archives take very little space. In fact, if you make a new commit that completely re-uses old files—this can happen in any number of ways—the new commit takes no space at all to hold the files, just a little space to hold the metadata.

      • Meanwhile, each commit holds some metadata. This is also frozen for all time (the hashing scheme depends on this). The metadata include things like the name and email address of the person who made the commit. They include a log message, where you get to write why you made the commit. (Don't just say that you changed line 42 or whatever: Git can figure that out from the snapshot. Say why you changed line 42. What was wrong with it before? What behavior did the program exhibit that was bad, that is now corrected by this change?)

        In this metadata, Git stores some information that Git needs: specifically, the raw hash ID(s) of a list of earlier commits. Git calls these the parents of the commit.

    Usually there is exactly one hash ID in this metadata list. That is, most commits have just one parent. These are your ordinary commits.

    By holding the hash ID of a single parent, each commit "points to" its predecessor. This makes a backwards chain of commits. For instance, suppose we have some commit with some hash that we'll call H, and we draw it with an arrow coming out of it, representing this backwards pointer to its parent commit:

                <-H
    

    The earlier commit to which H points has some other different hash ID, but, just like H, stores a snapshot and metadata, so let's draw this commit as commit G with a backwards arrow coming out of it:

            <-G <-H
    

    Commit G thus points to a still-earlier commit. Let's call it F:

    ... <-F <-G <-H
    

    F points backwards yet again, and so on. This is the history in the repository, starting (ending?) at commit H and working (backwards), one commit at a time.

    The history in a repository, in other words, is nothing more or less than the commits in the repository. Each commit has a full snapshot of every file, frozen in time as of the form that file had at the time you (or whoever) made the commit. And, each commit has a unique number; we're just using these uppercase letters to let our feeble human brains manage them here.

    Note that the very first commit ever made in some repository has no parent, because it can't have one. So it just has no arrow coming out of it. We could draw an entire chain of eight commits this way, then:

    A--B--C--D--E--F--G--H
    

    Commit H is the last commit, at the start (end?) of history, with A as the first commit, at the end (start?) of history. Git works backwards, so history "starts at the end".

    This is where branch names come in

    Git needs a fast way to find the last commit. It's easy for us to see the last one in these simple drawings, but real repositories can have thousands or millions of commits and any drawings you make will generally get very messy (this depends on the repository). So to provide an easy way to find the last commit, Git uses a branch name, like this:

    ...--G--H   <-- main
    

    The name main simply contains the raw hash ID of the last commit in the chain. From here, Git will work backwards as usual.

    If we want to have more than one branch name, we just create another name, also pointing to commit H, like this:

    ...--G--H   <-- develop, main
    

    Using commits

    While commits store files (snapshots) and metadata, the stored files are read-only, and in a format—a Git internal object—that only Git can read in the first place. No other programs can read these files, and nothing—not even Git itself—can overwrite them. But that's not how our computer programs want to work. They want to read and write real files, not weird Git-ized internal objects.

    To use a commit, then, Git has to copy all the files out of the snapshot. This is what git checkout or git switch does.<1 You pick a commit you want Git to extract, and run:

    git switch develop
    

    for instance to pick out commit H. Git now extracts the files into a work area, which Git calls your working tree or work-tree, where you can see them and, if you like, change them too.

    Note that these are your files to do whatever you like with. Git isn't using them. Git will, if you tell it to, eventually copy them back to make them ready for a new commit, using another area Git calls the staging area, which we won't describe properly here. But for now these are your files. If you run git checkout or git switch again, Git might remove these files and put in other files.


    1You can use either command; git switch is the newer and less-powerful, therefore less-dangerous one. Think of an overly complicated Swiss army knife: do you want the one where there's a self-starting chainsaw blade, or do you want the one that only has a regular knife blade? Sometimes you might want the chainsaw, but it's probably better to keep that as a separate tool.


    Updating a branch name

    Let's look now, briefly, at how a branch name gets updated when you run git commit. You've run git switch develop to select commit H to work on. Git attached the special name HEAD to the name develop to remember that this is the current branch name, like this:

    ...--G--H   <-- develop (HEAD), main
    

    You make changes to various files and run git add (for reasons we're skipping over), and then run git commit. Git prepares a new commit, gathering metadata—your name and email address, your log message, and for Git's history purposes, the raw hash ID of commit H—and makes a new snapshot of all files, taking into account the added updated and/or new and/or removed files too. These all go together into a new commit I, whose parent is existing commit H. Let's draw it in:

              I
             /
    ...--G--H
    

    I've drawn I on a new line, and left out the names, on purpose. Let's put the names back in now. Git has done something very sneaky here as the name develop—the one HEAD is attached to—no longer points to commit H!

              I   <-- develop (HEAD)
             /
    ...--G--H   <-- main
    

    If we add another new commit J, we get:

              I--J   <-- develop (HEAD)
             /
    ...--G--H   <-- main
    

    Note that there are now two commits that are on develop that are not on main. If we git switch main, Git will remove from our working tree all the files from commit J, and put in place instead all the files from commit H:

              I--J   <-- develop
             /
    ...--G--H   <-- main (HEAD)
    

    We're now "on" main again, using the files from commit H. The newest branch-main commit is commit H, while the newest branch-develop commit is commit J.

    Let's make another new branch now, named topic, and switch to it. This will also point to commit H:

              I--J   <-- develop
             /
    ...--G--H   <-- main, topic (HEAD)
    

    Now let's change some file(s), git add, and git commit. This makes a new commit K whose parent is H (not I, not J, but H), because H is the current commit as found by the current branch name topic. Then, having made commit K, Git writes the hash ID of commit K into the name topic:

              I--J   <-- develop
             /
    ...--G--H   <-- main
             \
              K   <-- topic (HEAD)
    

    These are our branches: H is the latest commit on main, J is the latest on develop, and K is the latest on topic. History works backwards from here, so from K we go back to H, then G, and so on; from J we go back to I, then H, then G, and so on; and from H, we go back to G and so on.

    This also means that all commits up through H are on all three branches. In Git, commits are often on more than one branch.

    Branch names are not the only kinds of names in Git

    Besides branch names, we can also have tag names, for instance. The key differences between these two kinds of names are:

    • You can't get "on" a tag name: git checkout v1.2, if v1.2 is a tag name, produces what Git calls a detached HEAD, and git switch v1.2 gives you an error unless you add --detached to allow Git to go into detached-HEAD mode.

    • Tag names do not automatically update. This is an outgrowth of the fact that you can't get "on" a tag name. When you make a new commit, Git updates the branch name that you're on, and in detached-HEAD mode, you are on no branch at all.

    • Tag names get shared.

    To explain that last point, it's time to talk about clones and git fetch.

    Clones

    I mentioned earlier that a Git repository consists primarily of two databases. One database holds commits and other internal objects, all found by object IDs. (Commits are one of four internal Git object types, though to use Git you mostly don't have to know this—unlike the rest of what I'm writing here.)

    The other primary database holds names: branch names, tag names, and all of Git's other names. These names all holds object IDs: mostly commit IDs (with tag names being a sometimes notable exception, but then the tags wind up pointing to a commit indirectly, so that you mostly don't have to know about annotated tag objects. We will skip over this detail here, but it crops up later when you go to create tags.

    When you clone a Git repository, with:

    git clone <url>
    

    you're instructing your Git to:

    1. create a new, empty directory (or use an existing empty directory) to create a new, empty repository;
    2. add a thing Git calls a remote—a short name that holds a URL, and the standard first "remote" is named origin—so that your Git can call up some other Git repository any time you like;
    3. call up that other Git repository now, and get all of their commits, but don't—quite—copy their branch names;
    4. rename their branch names instead, but do (usually) get all their tag names; and
    5. create one branch name in your new clone.

    So you have your Git software copy their commits and other objects database, but you don't have your Git copy their branch names. Instead, you have your Git take each of their branch names and turn those into remote-tracking names.

    A remote-tracking name is essentially2 formed by taking their branch name, such as main or develop or feature/tall or whatever, and sticking your own remote name—origin for this initial clone—in front to get origin/main, origin/develop, origin/feature/tall, and so on. Your Git does this with all their branch names. Your Git doesn't do this with their tag names: if they have a v1.2 and a v2.0, your Git will create your own tag names spelled v1.2 and v2.0 too.

    So tag names are different from branch names in this extra way: not only are they not supposed to move—they should identify one particular commit forever, rather than the latest commit on some branch—but they also get shared. Branch names aren't shared.


    2This glosses over a lot of detail.


    Adding remotes and using git fetch

    You can have as many remotes as you like. The first one is normally called origin, and git clone makes this one for you. In fact, git clone url is basically short for a six-command sequence, five of which are Git commands:

    1. mkdir (or whatever command your OS uses to make a new empty directory), with all the Git commands being run in the new directory;
    2. git init, to create an empty repository in this new directory;
    3. git remote add origin url, to add origin as the remote;
    4. any extra git config commands needed (sometimes there are a few);
    5. git fetch origin, to get all the commits and rename the branches; and
    6. git checkout / git switch with the "create a new branch" option.

    The branch that your Git checks out in step 6 is the one you choose with the -b option to your git clone command. If you don't give a -b option, your Git asks their Git software which branch name their repository recommends. Your Git then uses that branch name, which your Git renamed to origin/whatever, to create your branch whatever, pointing to the same commit as your origin/whatever.

    If their recommended name is main, then, you might end up with this:

              I--J   <-- origin/develop
             /
    ...--G--H   <-- main (HEAD), origin/main
             \
              K   <-- origin/topic
    

    Note how you have one remote-tracking name for each of their branch names, plus one branch name of your own.

    You can now run git remote add upstream if you like, to add a remote named upstream. Give a URL that your Git should call up. Then run:

    git fetch upstream
    

    with no arguments, and your Git will call up that Git. They will list out, for your Git, all of their branch names and the commit hash IDs that go with those branch names.

    Because of your earlier git clone, your Git probably already has most if not all of these commits, found via your origin/* remote-tracking names. For any commits that they do have, that you don't, your Git will ask their Git to package up and send over these commits. That might include some extra commits, or not. In any case, your Git now takes each of their (upstream's) branch names and renames them to form names like upstream/main:

              I--J   <-- origin/develop, upstream/develop
             /
    ...--G--H   <-- main (HEAD), origin/main, upstream/main
             \
              K   <-- origin/topic
               \
                L   <-- upstream/topic
    

    Here, they have the same three branch names on upstream as your Git found on the Git you're calling origin. But upstream's topic names commit L, not commit K. So your Git obtained commit L from them. Your Git did not need to obtain any other commits—you had the rest already—and then your Git created your upstream/* names.

    What you're doing with git fetch upstream theirbranch:mybranch

    Above, I describe the normal operation of git fetch remote, when you don't use any extra arguments. If you do use extra arguments, such as:

    git fetch origin main
    

    or:

    git fetch upstream main
    

    the remaining arguments after the remote are what Git calls a refspec.

    A refspec can get complicated, but it comes in two relatively simple forms. One form is like this: just a branch or tag name. Git will figure out from context whether it's a branch name or tag name, if Git can do that; if not, you must help Git out by telling Git explicitly that this is a branch or tag name, which we won't show here.

    The more complicated form has two names separated by a colon : character:

    git fetch upstream main:upmain
    

    The name on the left is a source, which for git fetch is the remote repository's branch or tag name.3 The name on the right is the destination: for git fetch, that's the branch or tag name you'd like your Git to create or update in your repository.

    This update operation works by shoving a new hash ID into the name, if the name exists, or by creating the branch or tag name holding the hash ID, if the name does not yet exist.

    If you're on your main branch like this:

              I--J   <-- origin/develop, upstream/develop
             /
    ...--G--H   <-- main (HEAD), origin/main, upstream/main
             \
              K   <-- origin/topic
               \
                L   <-- upstream/topic
    

    then your current branch is main and your current commit is commit H.

    If you were to run:

    git fetch upstream topic:topic
    

    that would tell your Git to go over to upstream, find that they have commit L as their topic, bring over commit L if needed—it's not needed because you have it now—and then create or update your branch name topic to point to commit L. Since you have no branch name topic, your Git could do this, producing:

              I--J   <-- origin/develop, upstream/develop
             /
    ...--G--H   <-- main (HEAD), origin/main, upstream/main
             \
              K   <-- origin/topic
               \
                L   <-- topic, upstream/topic
    

    Note that your current branch main continues to point to commit H.

    But if you ask your Git to:

    git fetch upstream topic:main
    

    you're now telling your Git to find that they have their topic referring to commit L, and to write commit L's hash ID into your name main. If your Git did do this, you would have:

              I--J   <-- origin/develop, upstream/develop
             /
    ...--G--H   <-- origin/main, upstream/main
             \
              K   <-- origin/topic
               \
                L   <-- main (HEAD), upstream/topic
    

    This would indicate that your current branch main's current commit is L. The problem here is that all the files in your working tree (and index) came out of commit H, not out of commit L. They will still match the files in commit H.

    Your Git therefore says no, I won't move the name main to commit L, as that would disrupt the smooth workings of your current checkout. Which it would, so don't do that. Just run:

    git fetch upstream
    

    and then, if you really want your name main to point to commit L, use git reset --hard upstream/topic to achieve that, knowing exactly what git reset --hard does.4


    3Refspecs are used with git push as well, although their interpretation is a bit different here, and for git push the source is your repository, not the remote repository.)

    4Remember that git reset --hard means if I have unsaved work, destroy it irrecoverably. Git will do that! You should probably make sure you have no unsaved work first.