Search code examples
gitgithubbranchcommit

How do I push code from a new local folder to an existing Github repository's main branch and keep commit history?


I have an existing repository in which the main branch has several commits.

I have a local folder on my desktop that contains code that I wish to commit to the main branch and completely replaces all code in the previous commit. I wish to retain the commit history of the main branch.

How do I connect my local folder to the existing github repository, point it to the main branch, commit to this branch such that it replaces all code in the previous commit and yet keeps all the commit history up to this point?

I have done the following:
# Initialize the local directory as a Git repository.
git init

# Add files
git add .

# Commit your changes
git commit -m "First commit"

# Add remote origin
git remote add origin <Remote repository URL>
# <Remote repository URL> looks like: https://github.com/user/repo.git

# Verifies the new remote URL
git remote -v

# Push your changes to main
git push origin main

But it says git error: failed to push some refs to remote


Solution

  • Git is a tool, or set of tools, not a solution. (Here is another article, perhaps better, on the difference.)

    This has advantages. In particular, since it is a set of tools, you can build almost anything you want with it. It doesn't just turn out birdhouses, or bookcases, tables or filing cabinets, or oboes and guitars and violins, or whatever else you might be able to make in a properly equipped shop: if you know what you're doing, it can make all of these. But you will need to be a master craftsman, or at least experienced, to build a good house with it—even if it's just a bird or dog house.

    On the plus side, it seems that you already understand that in Git, the commits are the history. This means you're off to a good start!

    You will need to make one big decision up front: How do you want the history to look?

    An illustration, starting with git log

    If we look at the Git repository for Git itself, we see stuff like this (lightly edited to take out @ signs to cut down on spam, perhaps):

    commit eb27b338a3e71c7c4079fbac8aeae3f8fbb5c687 (HEAD -> master, origin/master)
    Author: Junio C Hamano <gitster pobox.com>
    Date:   Wed Jul 21 13:32:38 2021 -0700
    
        The sixth batch
        
        Signed-off-by: Junio C Hamano <gitster pobox.com>
    
    commit fe3fec53a63a1c186452f61b0e55ac2837bf18a1
    Merge: 33309e428b d1c5ae78ce
    Author: Junio C Hamano <gitster pobox.com>
    Date:   Thu Jul 22 13:05:56 2021 -0700
    
        Merge branch 'bc/rev-list-without-commit-line'
        
        "git rev-list" learns to omit the "commit <object-name>" header
        lines from the output with the `--no-commit-header` option.
        
        * bc/rev-list-without-commit-line:
          rev-list: add option for --pretty=format without header
    
    
    commit 33309e428bf85a0f06e4d23b448bf5400efe3f17
    Merge: bb3a55f6d3 351bca2d1f
    Author: Junio C Hamano <gitster pobox.com>
    Date:   Thu Jul 22 13:05:56 2021 -0700
    
        Merge branch 'ab/imap-send-read-everything-simplify'
        
        Code simplification.
        
        * ab/imap-send-read-everything-simplify:
          imap-send.c: use less verbose strbuf_fread() idiom
    

    This subset of history—a small snippet of just three commits—is missing something: two of the displayed commits, fe3fec53a63a1c186452f61b0e55ac2837bf18a1 and 33309e428bf85a0f06e4d23b448bf5400efe3f17, are merge commits, each with two parent commits instead of just one. The two parents of the middle commit fe3fec53a63a1c186452f61b0e55ac2837bf18a1 are (abbreviated) 33309e428b and d1c5ae78ce, as you can see from the first Merge: line in the output above. One of those two commits is the other merge (the third commit shown), but where is commit d1c5ae78ce? It's more than 40 commits down, and we can't really see where it lives in the history unless we add --graph or similar.

    The graph option makes git log draw a crude ASCII-art attempt at the kind of fancy graphs you'll see in some of the answers to Pretty Git branch graphs (a question, and set of answers, worth reading carefully several times, though it's quite long). I'm particularly fond of the output from gitdags for tutorials, though it's not at all suited to automated work (it requires careful hand construction of LaTeX input).

    In the end, though, the decision you want to make for your case comes down, I think, to the answer to just one question: Do you want a linear history, or a branch-y one? That is, assuming the existing repository has just a few commits in it that we can draw as A-B-C-D, we'll call your new commit E. Do you want the history to look like:

    A--B--C--D--E   <-- main (or master)
    

    or do you want it to look like:

    A--B--C--D
              \
               M   <-- main (or master)
              /
    E--------’
    

    ?

    In either case, you need one repository in which all the commits exist

    To make this one repository in which all the commits exist, you can begin with exactly what you did already:

    git init
    git add .
    git commit -m "First commit"
    git remote add origin <url>
    

    So far, you have made the commit that we will call E, or a commit that looks almost exactly like the commit that we will call E eventually. The branch name that locates this commit in your Git repository is almost certainly master; if you want this to be main instead, now is a good time to rename it:

    git branch -m main
    

    If you want to leave it as master (or if you set your Git up to use main so that it's already main1), you can leave out this step. The actual branch name is not important (well, except to humans ... which, OK, makes it important 😀—but the point here is that Git doesn't care what name you use, as long as there is some name by which to find the commit).

    You do not yet have the A-B-C-D commits, so the next step is to obtain them:

    git fetch origin
    

    or just:

    git fetch
    

    Your Git now calls up the other Git at origin and obtains the existing commits (however many there really are; I'm just going to assume four, and one branch name, which I'm going to assume is master). Your Git renames their branch names to make your remote-tracking names, by sticking origin/ in front of each name.2

    You now have, in your local repository, this:

    E   <-- master
    
    A--B--C--D   <-- origin/master
    

    We're now ready to go on, but the next steps depend on whether you want a linear history or a branch-y / merge-y history.


    1This requires a pretty modern Git, so that you can either use an argument to git init—that wasn't in your original quoted commands, so I am guessing you did not do this—or configure the initial branch name.

    2Technically, your Git puts these remote-tracking names into an entirely separate namespace, so that even if you name a (local) branch origin/foo, Git won't confuse it with a renamed foo-from-origin that becomes the remote-tracking name origin/foo. You'll just have two names that display as origin/foo, which will confuse the heck out of humans, but Git will be fine. But don't do that, it's too confusing.


    To make a merge-y history, we'll use git merge

    Let's say you want:

    A--B--C--D
              \
               M   <-- master
              /
             E
    

    which is the same drawing I made earlier, I just slid E over further; we can also put E on the top line, and/or draw this as:

              E
               \
    A--B--C--D--M   <-- master
    

    or:

            E--M   <-- master
              /
    A--B--C--D
    

    and of course we can change the branch name from master to main to zebra or zorg or whatever, any time we like, since Git has no interest in the actual name. It's just that it's a good idea to pick one and stick with it, to avoid confusing humans.

    There's a hitch or two, though:

    • git merge, since Git version 2.9, refuses to merge "unrelated histories" by default. We can overcome this with --allow-unrelated-histories (assuming a Git version 2.9 or later; prior to 2.9, this option does not exist but is not required).
    • Merging unrelated histories is fraught.

    There's a really big escape hatch that works in our favor, though, for the latter: the -s ours strategy.3 This strategy tells Git: completely ignore the files in the commit being merged. Use our committed files instead.

    So:

    git merge -s ours origin/master
    

    (assuming your remote-tracking name, in your repository, is origin/master) produces exactly the graph we want. Then:

    git push -u origin master
    

    works right off and sends our two new commits, E and M, to origin, which they add to their master, so that their master identifies commit M by its raw hash ID, just as our master identifies commit M by its raw hash ID.

    (Note that the commits are shared. We and they use the same commits, with the same hash ID. The branch names are not shared: we have ours and they have theirs. We now arrange to store the same hash ID into our master as they store into their master, but these are independent and will, over time, evolve independently until we deliberately synchronize them again. Every new commit gets a new, unique hash ID, so it's always easy for two Gits to share new commits with each other: they always have unique hash IDs, no matter where they were made. Two commits only have the same hash ID if they were already shared, and the one Git got that commit from the other Git already.)


    3This is not to be confused with the -X ours eXtended strategy option, which won't work at all here. The -s option to git merge supplies a strategy. The -X option to git merge passes its argument to that strategy, as an option. The git merge documentation calls this a strategy-option, which is ridiculously easy to confuse with the -s strategy option. So I say: call it an eXtended-option, or eXtended-strategy-option.


    To make a linear history, we have to work harder

    If we want the history to resemble:

    A--B--C--D--E
    

    we have a problem, because we already have commit E and no part of any existing commit can ever be changed.

    Commit E's real hash ID, though, is some big ugly random-looking hash ID that nobody can remember, and that only exists in our local repository at this point. Suppose we take our existing commit E and copy it to a slightly different, new-and-improved commit E' that we would draw like this:

    E   <-- master
    
               E'  <-- new-master
              /
    A--B--C--D   <-- origin/master
    

    Now we rename our existing master to old-master, and our existing new-master to master:

    E   <-- old-master
    
               E'  <-- master
              /
    A--B--C--D   <-- origin/master
    

    Then, once we're sure we like the result, we just delete the name old-master entirely. The commit itself lingers in our repository for a while—at least 30 days by default, though this gets a bit complicated4—but eventually it goes away. Since nobody can remember the old hash ID in the first place, and git log doesn't show it any more, we can just pretend that we never had an E and that E' is called E, or whatever, and this gives us a simple linear history. I'll draw it as E' though:

    A--B--C--D--E'  <-- master
    

    (origin/master still points to D; I just didn't draw it).

    We can then run git push origin master to add E' to their master, in the same way we would run git push origin master to add E-and-M in the branch-y case. But we still have to make E' somehow.

    The easiest way to do this is to use either git commit-tree or git read-tree, both of which are plumbing commands and rather specialized. I'll use git read-tree in my example here:

    git branch -m master old-master
    git checkout -b master origin/master
    git read-tree -m -u old-master
    git commit -C master
    git log                     # inspect the result
    git push origin master      # note: `-u` not required
    git branch -D old-master    # whenever you like
    

    An explanation of this sequence follows; first, let's get the footnotes out of the way.


    4The sticking-around is based on three factors. The first two are an object minimum lifetime, which defaults to 14 days, and a reflog entry lifetime, which defaults to 30 days. When we delete the branch, we delete the branch's reflog too. Some consider this a bug in Git and there's a very-back-burner project, or at least idea, to change this, but it's true for now. However, the HEAD name in Git has its own separate reflog, and that reflog will keep the commit around for 30 days from the time we made it, by default. The third factor is that git gc --auto, which various Git commands run automatically for you, doesn't actually run git gc until Git thinks it would probably be profitable. It's very hard to predict precisely when that will be.


    Explanation

    The branch rename is straightforward:

    git branch -m master old-master
    

    This renames our existing master to old-master. (Since we're on master we could just run git branch -m old-master; I typed in the full command out of habit, and the above is a cut-and-paste.)

    The next command is equally straightforward:

    git checkout -b master origin/master
    

    This creates a new master branch name locally, using origin/master as the commit hash ID. That gives us a pointer to commit D (assuming four commits on origin/master). As a side effect, the new master already has an upstream set, namely origin/master.

    Now we want to get Git's index and our working tree to match commit E. This is where the funky git read-tree command comes in:

    git read-tree -m -u old-master
    

    The git read-tree command is complicated and is one of Git's true workhorses, but it's very old: it implements part of git merge and part of git checkout. Here, we're using it to run the equivalent of git checkout, without updating the current branch name at all. A longer but clearer way to accomplish the same thing is to use git rm -rf . to remove every file, followed by git checkout old-master -- . to repopulate Git's index and our working tree. Should git read-tree ever vanish, that would be a way to do what we want. (Another would be to use git checkout followed by git symbolic-ref, and yet another would be to use git commit-tree followed by git reset or git merge --ff-only: there are many tools in this workshop that will get you from point A to point B.)

    The git commit we use is pretty much the normal git commit, with just one option added, namely -C old-master:

    git commit -C old-master
    

    The -C option tells Git to take the commit message from the specified commit. Here, we point git commit to existing commit E to grab the commit message. You can leave out the -C option and just type in an all-new commit message, if you like.

    The final git push is like any git push, but because we created our master with its upstream already set to origin/master, we don't need the -u option that we needed in the branch-y / merge-y construction.

    Which is better: linear history or branchy/mergy?

    Which is better, chocolate ice cream, or strawberry? Which is better, learning to fly kites, or learning to fly helicopters? It's partly a matter of opinion and partly a matter of where you want to go and how you want to get there, n'est-ce pas?