Search code examples
gitgithubgit-resetgit-merge-conflict

What is happening behind the scenes, with the commits and HEAD while trying to git reset --soft and undoing it?


I Issued git reset --soft HEAD~1 then I decided not to do this, I want to go back to my previous state.

So, I searched in Stack Overflow, then I got this answer:

Now I just want to go back to the time before I issued the reset

If you only want to cancel the git reset --soft you just did, you can look up the former HEAD commit id in the reflogs

$ git reflog
$ git reset --soft formerCommit

I issued the git reset --soft formerCommit as mentioned. Then I checked the HEAD using git reflog--

enter image description here

It created two more logs as you can see, but my head is at the required commit. So, I thought if I make new changes and push it, it will just work fine.

But no, I got this error (pushed after making a new change)--

$ git push
To https://github.com/Arpan619Banerjee/Forkify_JS_app.git
 ! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'https://github.com/Arpan619Banerjee/Forkify_JS_app.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Though git log do not show those two extra logs shown in git reflog

enter image description here

So, now my questions are ---

  1. What is the correct way to revert changes in some files(not all the files previously committed) and push it again?

  2. What is the advantage of using the commands that I issued in the question while resetting if I have to force the push?

  3. After issuing git reset --soft HEAD~1 , I could have unstaged the changes> made new changes>commited> forced the push.

  4. Why do I always need to force the changes? I understand that the Heads have diverged, but isn't there any other way to gracefully do this?

How can I understand the concepts, I am new to git and self learning it.


Solution

  • First, direct (although not all that useful) answers to your direct questions:

    1. What is the correct way to revert changes in some files(not all the files previously committed) and push it again?

    There isn't one. That is, there is no one single correct way to do this. There are many different correct ways to do this.

    1. What is the advantage of using [git reset --soft] if I have to force the push?

    If this produces the commit graph you like, that's a way to produce the commit graph you like.

    1. After issuing git reset --soft HEAD~1, I could have unstaged the changes> made new changes>commited> forced the push.

    This is not a question. It is a statement (and a true one).

    1. Why do I always need to force the [push]?

    You don't always need to; you do for these cases because you're deliberately asking some other Git repository to throw away some commit(s).

    Key concepts

    How can I understand the concepts?

    The key here is that Git is not really about files—though it stores files—nor about branch names, though it uses branch names. Git is really about commits.

    Each commit stores files. In fact, each commit stores a complete snapshot of all files. The files inside these snapshots, however, are in a special, read-only, compressed and de-duplicated format, that only Git itself understands. None of the other software on your system can make any sense of these files. (Note that the de-duplication means that it's OK that every commit stores a copy of every file, because most commits just re-use most of the files from some other commit, so that these multiple copies take almost no extra space.)

    It's important to remember that nothing about any commit can ever be changed after the commit is made. In other words, all commits are completely, totally read-only. (This is actually a general property of all internal Git objects and it ties in to their object hash IDs.)

    But if commits are read-only, and only Git can read the files, how do we ever use them? We need ordinary files. We need to be able to read them! We need to be able to write to them! So to use Git, we have to take the files out of Git. This is what git checkout—or in Git 2.23 or later, git switch—is for: to locate some particular commit and extract it. Note that Git will find the commit by its hash ID, even if we use a branch name like master. We'll get into this a bit more in a moment.

    Before we dive into that, though, we should look at exactly what a commit does for you, because there are two parts to each commit:

    • A commit has a full snapshot of all your files. That's its main data: a full copy of every file you told Git to save, at the time you told Git to save it, in the form it had in Git at that time.

    • A commit also has metadata, such as the name of the person who made it. Most of this metadata is the stuff you see in git log output: someone's name and email address, a date-and-time-stamp, and a log message, for instance. But one piece of metadata is crucial for Git itself. Each commit stores the raw hash ID of its immediate parent commit, or for a merge commits, its parents (plural).

    Most commits have just one parent. If we have a chain of such commits, we can draw them like this:

    ... <-F <-G <-H
    

    where each letter stands in for some actual commit hash ID. Commit H is simply the last commit in the chain. Because we used commit G to make commit H, commit H itself remembers commit G's hash ID, in its metadata, as its parent.

    We say that commit H points to commit G. Note that these arrows always point backwards. They have to, because everything inside a commit is read-only. But commit G also points backwards, to commit F: G's parent is F. And of course commit F points backwards as well, to yet another earlier commit.

    What this means for us is that if we can just remember the hash ID of the last commit in some chain of commits like this, we can use that commit to find all the earlier commits. Git now has a clever trick: A branch name like master simply holds the hash ID of the last commit in the chain.

    Building a new commit

    So, suppose we have a tiny repository with just three commits in it. We'll call these commits A, B, and C and draw them like this:

    A--B--C   <-- master
    

    I get lazy here (for a reason) and don't draw the commit arrows as backwards-pointing arrows, but we do have to remember that they only go backwards. That is, if we have commit B, we can't easily go forwards to C, but we can easily to backwards to A.

    To build a new commit, we start by having Git extract one of our existing commits. Let's pick commit C, by using the name master. Git extracts commit C somewhere—we'll get into this more in a bit—and lets us see and work with all of its files. We'll need to stage and commit too, and we'll go into that more in a bit as well. But one way or another we will make a new commit, which gets a new, unique hash ID.

    The new hash ID looks random. It isn't random at all: it's actually a cryptographic checksum of all of the commit's contents. It depends on every bit of every file saved in that commit, and every bit in all of the metadata for that commit, including your name, the date-and-time when you make it, and the parent hash ID. But it's way too hard to predict: you'd need to know all of this and the exact second at which you're going to make the commit, in order to predict the hash ID. So it looks random, and we'll just pretend it is for now. Because the new hash ID is big and ugly and random-looking, we'll just call it D.

    Git makes commit D so that it points back to existing commit C, like this:

    A--B--C
           \
            D
    

    Now comes the trick: because D is the last commit in the chain, Git stores D's actual hash ID into the name master. The name moves! The name now points to the new commit:

    A--B--C
           \
            D   <-- master
    

    Now, we can have more than one branch name, pointing the same commit. Suppose we made a new branch name dev first, so that we start out with this:

    A--B--C   <-- dev, master
    

    If we pick branch dev to use, we pick commit C. If we pick branch master to use, we still pick commit C. Both names point to C right now, so either way we get commit C. The difference is what happens when we make a new commit.

    If we tell Git that we'd like to use the name dev—if we git checkout dev or git switch dev—then we start out with this:

    A--B--C   <-- dev (HEAD), master
    

    That is, Git attaches this special name HEAD to one (and only one) branch name. Then when we do make new commit D, that's the name that Git updates. So this time, we'll have:

    A--B--C   <-- master
           \
            D   <-- dev (HEAD)
    

    We can add and delete branch names any time we like

    Suppose we have:

    A--B--C   <-- master (HEAD)
           \
            D   <-- dev
    

    That is, we made commit D while on dev, but then ran git checkout master or git switch master to go back to previous commit C. Our HEAD is now attached to the name master.

    Suppose we now tell Git: delete the name dev. We'll have to force Git to do this, with:

    git branch --force --delete dev
    

    or:

    git branch -D dev
    

    (older versions of Git require the uppercase -D option here as git branch once did not to know how to use --force with --delete). The result is this:

    A--B--C   <-- master (HEAD)
           \
            D   ???
    

    Commit D is still in the repository for a while, and if we know its hash ID, we can find it. But Git finds commits by branch name, so now Git can't find commit D for us—except, that is, via Git's reflogs.

    I won't go into a lot of detail here, but there is one reflog for each branch name, plus one more for HEAD itself. When you have Git delete a branch name like dev, this also deletes its reflog.1 Fortunately, if you were on the branch—had used git checkout or git switch to get to it—recently, the HEAD reflog will store the hash ID. Note that reflog entries eventually expire and get discarded, and server Gits—the ones you git push to—usually don't even have reflogs.


    1This deletion of the reflog is almost certainly a mistake, that Git has lived with for 15 years. It may someday be fixed, especially if Git acquires real database software for its name-to-value key-value store. This would allow undeleting a branch name.


    Names—and other commits that Git can find—are how Git finds commits

    We mentioned earlier that branch names hold the hash ID of the last commit in a chain. For instance, we can have:

              I--J   <-- branch2
             /
    ...--G--H   <-- master
             \
              K   <-- branch3
    

    Here, commit J is the last commit on branch2. Commit K is the last commit on branch3. Commit H is on all three branches but is the last commit on master.

    Commit G can be found in any of three ways:

    • start at master and step back once, or
    • start at branch2 and then step back three times (J, then I, H, G), or
    • start at branch3 and then step back twice (K, H, G).

    If we ignore the reflogs, these are the three ways to find G, but there is only one way to find I, J, or K. If we delete the names branch2 and/or branch3, those commits won't be find-able.

    Git will eventually discard un-find-able commits. In your own repository, this takes longer because the reflogs are there and act as an alternative way to find the commits. In a server repository, there aren't any reflogs and the commits will generally expire within just two weeks.2

    We can safely delete the name master, though, as long as we keep at least one of the other two names, because Git will keep H around as long as it can find J by the name branch2 and then find I and then find H, for instance.


    2The two week grace period gives Git some time to get them properly named: various operations in Git create objects first, then adjust the names to be able to find them. So Git's cleanup / garbage-collection process has to give other Git commands a little bit of time to get their work done.


    We can forcibly move a name any time we like

    Suppose we have:

    ...--G--H   <-- master (HEAD)
    

    and then we make a new commit I on master:

    ...--G--H--I   <-- master (HEAD)
    

    Now suppose we decide we don't like something about commit I. Maybe one of its snapshot files is wrong. Maybe we misspelled a word in our git log message. We can make Git move the name master back one step, using git reset, like this:

              I   ???
             /
    ...--G--H   <-- master (HEAD)
    

    Where is commit I? It's still there, in the repository. We can find its raw hash ID in git reflog, if we like, because our Git has reflogs. But we can't find it using the name master, because master points to H, not I, now. We can't find it using commit H, because H points backwards, to commit G, not forwards to I.

    This is one of three things that git reset often does: it moves the branch name to which HEAD is attached. Note: the git reset command is very big and complicated and this answer is only going to talk about a few of its many jobs.

    We can also use the git branch command, with the --force flag, to move any branch name to point to any commit. Well, almost any branch name: git branch will refuse to move the name we're using right now by having it checked-out. That is, if HEAD is attached to master, we can move any branch name except master, using git branch.3 There's a reason for this, and it has to do with Git's index and your work-tree.


    3If you use git worktree add to create extra work-trees, they each have their own HEAD. This will "lock down" other branch names like this.


    Git's index and your work-tree

    We noted several times so far that commits are read-only and must be extracted. The extracted files are easy to comprehend: they're just ordinary files. They live in an area that Git calls your working tree or work-tree.

    A standard (non-server, non-"bare") repository comes with a work-tree. The work-tree is not actually in the repository: all of the stuff that makes up the repository is in the .git directory / folder. The .git folder is usually at the top level of the work-tree, so they're almost side-by-side. People tend to think of these as a pair, and they work as a pair, but your work-tree is yours. You're allowed to mess with it as much as you like. It's the stuff in the .git folder that's Git's, that you should not mess with, in general.

    So, when you pick a commit—using git checkout master or git switch dev, for instance—to have and work on / with, Git will copy the committed files, which are saved in the commit, to your work-tree. The committed files are all read-only, and are saved for all time, or at least, for as long as the commit itself continues to exist. As long as Git can find the commit, its files are safely saved forever—well, provided you don't mess with Git's files in .git, and your laptop doesn't catch fire or whatever.

    This would be a nice simple picture—Git has its commits, and your work-tree has the files you'll use to make the next commit—but it's wrong in an important but subtle way. Git doesn't actually make new commits from your work-tree!

    We noted earlier that the files inside a Git commit are in a special, Git-only, frozen and de-duplicated form. To go along with that, and to make its own job much easier, when you extract a commit, Git copies the committed files to a sort of holding area that Git calls its index or staging area.3 So this means that Git has not two, but three copies of each file. If you have, in your commit, a file named README.md, Git has three README.md files:

    • git show HEAD:README.md: this is the committed copy. It can't be changed. You can make a new commit, or check out some other commit, to select some other committed copy, but like all committed copies, it is frozen for all time.

    • git show :README.md: this is the index or staging area copy. It can be changed! It's in the frozen format, but you can replace it wholesale using git add, or even remove it entirely.

    • README.md (just a regular file): this is your copy in your work-tree. You can do whatever you like with it.

    These are the three important copies of each file. When you run git commit, Git will make its new commit by packaging up all the index copies. Those become new snapshots, which live forever, or at least, live as long as you and Git can find the commit's hash ID.

    Until you make a commit, the index and work-tree copies of the file are just temporary, though. They can be replaced or removed! They are not permanent like committed copies. The one in your work-tree isn't even in Git, in most senses. (The staged one is kind of halfway-into Git.)

    Note that there is a staged copy of every file, not just files you have just git add-ed. What git add does is replace the staged copy with an update from your work-tree. If you modify the work-tree copy, then update the staged copy, now the staged copy doesn't match the HEAD copy. If you un-modify the work-tree copy and stage again, now the staged copy does match the HEAD copy again.


    3Technically, what's in the index / staging-area is not a copy of each file, but rather information about the file: its name, its mode, and an internal Git blob hash ID. The snapshot is stored as an internal Git blob object. But Git handles all of this invisibly for you, and for the most part, you can just think of the index as holding a copy of the file, in Git's frozen format. The only time this breaks down is if you use git ls-files --stage or git update-index to deal with Git at an extra-low level.


    git status runs two comparisons

    When you use git status—which is a good idea; it tells you a lot of useful stuff—it actually runs two compare operations:

    • First, it compares the HEAD commit to the index. For each file that is the same, Git says nothing. For each file that is different, Git says staged for commit.

    • Then it compares the index to your work-tree. For each file that is the same, Git says nothing. For each file that is different, Git says not staged for commit.

    Files that are in your work-tree, but not in Git's index, are a bit of a special case: they are called untracked. We won't go into more detail here, but note that the set of files that is in Git's index changes, as you git add or git rm files, and it changes when you use git checkout or git switch to extract some other commit. So whether a file is tracked or untracked depends on what's in Git's index right now, and you can change that.

    Sometimes, changing what's in the index also tells Git to overwrite your work-tree in some way, too, though. That's particularly true for git checkout, since it has to fill in both Git's index and your work-tree. But it's also true for git reset, depending on how you run it.

    Your kind of git reset is about doing 1, 2, or 3 things

    The git reset command is big and we won't talk about all of it, but the kind of reset you're doing:

    git reset --soft <hash>
    

    uses a version of git reset that does up to three things:

    1. It can move a branch name. You pick a commit hash ID—by name, or by cutting and pasting an actual hash ID, for instance—and you are telling Git: move the current branch name, to which HEAD is attached, so that it points to that commit.

    2. It can erase-and-reload Git's index. Again, you pick a commit, and Git will load its own index from that commit. Files that were in the index before, now aren't any more. The files that are in the index now are the ones from that commit.

    3. And, it can erase-and-reload your work-tree. Any work-tree files that were in the index, but should not be in the index at all, will be removed; any work-tree files that need to be changed in the index will be replaced in your work-tree; and for any files that are in the commit you're moving to, but weren't in Git's index before, Git will copy those files into its index and your work-tree.

    That last step—of updating your work-tree—only happens if you tell Git to do it, using git reset --hard. With the default git reset --mixed, Git will only do steps 1 and 2. If you use git reset --soft, Git will only do step 1.

    Note that Git will always do step 1—it will always move the current branch—but you can pick the current commit as the commit to move to. That is, if your current branch name currently names commit H, and you say git reset --hard hash-of-H, the name "moves" so that it now points to commit H, just like it did before. So that's what git reset --hard, with no commit hash ID, is for: it "moves" the branch without moving it, but then also resets Git's index and resets your work-tree.

    The git reset --soft variant makes Git do step 1—move the name—and then stop. So this only makes any sense if you are planning to really move the name. You pick a new commit hash ID, and the current name moves there, with no other changes.

    If you have:

    ...--G--H--I   <-- master (HEAD)
    

    and you use git reset --soft hash-of-H, you get:

              I
             /
    ...--G--H   <-- master (HEAD)
    

    Git's index and your work-tree are unchanged, so you can now run git commit and make a new commit. If you like, you can edit files and git add them, to update your work-tree and copy the update into Git's index first. Either way you can make a new commit with a new date-and-time and maybe a different log message:

              I
             /
    ...--G--H--J   <-- master (HEAD)
    

    Because J is a different commit, it has a different hash ID.

    git push is about changing things in some other Git

    This brings us to git push. It's all well and good to make changes in your Git repository, by adding and maybe removing (or shoving aside) some commits and moving various branch names around. But that only affects your repository. There's some other set of Git repositories and you might like to make changes to them as well.

    If you can log in on those machines, you can make commits there. But the commits you make there will be made at a different time, and will have different hash IDs—and maybe you can't log in on those machines anyway. For instance, maybe that other repository is on GitHub.

    Fortunately, Git has git push as a built in operation. This has your Git call up some other Git. Your Git and their Git will then have a conversation of sorts.4 Your Git will list, for their Git, some commit hash ID. Their Git can look at their repository and see if they have that hash ID. If they do, they have that commit now. If not, they don't have that commit, and your Git will send it. Your Git is then obligated to offer the commit's parent(s), too, and will send it / them as needed, and their parents, and so on.

    The result of all of this is that if they have:

    ...--G--H--I   <-- master
    

    in their repository, and you have:

              I
             /
    ...--G--H--J--K--L   <-- master (HEAD)
    

    your Git will offer L and they will say please send it and your Git will offer and send K and J as well, and offer H but they'll say no thanks, I already have that one.

    So, by this process, your Git sends to them any commits that they don't have but will need. Then your Git sends some final requests, or—if you use the force option—commands: If it's OK, please set your master to point to commit L or Set your master to point to commit L!

    If they obey this polite request or command, they will now have:

              I
             /
    ...--G--H--J--K--L   <-- master
    

    just like you do. But they do not have any reflogs. So they won't be able to find commit I! Their Git will remove commit I pretty fast—not right away, but it won't be safe for at least a month, like yours will.

    If you don't use --force, their Git will say: No, if I set my master to point to L, I'll lose some commit(s) off my master. They don't even check to see if some other branch name protects commit I. They just say no. Your Git tells you about this by printing the non-fast-forward error.


    4Note that the closest thing there is to the opposite of push is actually git fetch, which also has a conversation with the other Git. The difference is that with this conversation, your Git gets things from their Git. It ends with your Git updating your remote-tracking names.

    The git pull command, which might seem like the opposite, isn't: it means run git fetch, then run a second Git command—often git merge. The git push command doesn't run any second Git command; it never does any merging.


    Another way to get what you might want

    What if you have:

    ...--G--H--I   <-- master
    

    and you don't like the arrangement of some file(s) in I? Well, that's not really a problem. You don't have to remove I from this chain. Just check out master, so that it's the HEAD commit and you have all the files active as usual:

    ...--G--H--I   <-- master (HEAD)
    

    and then change your work-tree and index so that it contains the files you would like. You can get some file(s) from commit H, or from commit G, directly. In Git 2.23 or later, you can use the git restore command to do this:

    git restore --source=<hash-of-G> --staged --worktree path/to/file.ext
    

    This copies the committed path/to/file.ext file out of commit G, into both the index / staging-area (--staged) and your work-tree (--worktree).

    If your Git is older than Git 2.23, you can do this with the git checkout command—the git checkout command is at least two commands rolled into one.5 So in older Git, or if you have this as a habit already, you can do:

    git checkout <hash-of-G> -- path/to/file.ext
    

    which copies the file to both Git's index and your work-tree, just like git restore.6

    In any case, once you have updated your index and work-tree to have what you'd like them to have, you can run git commit to make a new commit:

    ...--G--H--I--J   <-- master (HEAD)
    

    New commit J holds the snapshot you want, and has whatever log message you choose to put in. Its parent is existing commit I. You can now call up some other Git and send over new commit J to add on to their master, which in their repository points to the (identical to yours and therefore same hash ID) commit I. This adds to their master, without dropping any commits, so they will take this git push without complaint.7


    5In Git 2.23, the Git folks finally acknowledged that git checkout has too many jobs, and split them up. Someday, perhaps, they might do this for other commands like git reset—though in fact, git restore takes on a lot of what you can do with git reset, and would allow git reset to be smaller command. But Git tries to keep its commands backwards-compatible through upgrades, so git checkout still works, and git reset can't be trimmed back.

    6Unlike git restore, you cannot choose to have the file go into just one of these two places—not using git checkout anyway.

    7On GitHub, where you can set some branches to be protected, this requires that the branch not be "protected", or that GitHub have been told that you have administrative privileges and can git push to a protected branch. Protected branches are not built in to Git itself: they're an add-on. Most web hosting sites seem to have added them on by now, though.