Search code examples
gitversion-controlreset

How to fix a local copy messed up with git reset HASH


m

I added some changes to a local file and then committed them (A) , I then added more commits (B&C) and then realized, that I didn't want the change from commit (A) so I went ahead and did a git reset --hard <HASH>

which removed the previ9ous commit from the git log but kept the change I added in (A) in the file and when I change this file now back to its original state, it marks it as changed which I don't want. I want the file to reflect the origin. How do I get this done from here?

(I thought this is what reset --hard does)


Solution

  • As already answered in a comment, you can't get the final effect you want with git reset. The details here are, however, quite messy, because git reset is something of a kitchen-sink command.

    I added some changes to a local file and then committed them (A) ...

    The thing to realize here is that every Git commit is a full snapshot of every file. When you change one file and make a new commit, you make a new full snapshot.

    This doesn't take a whole lot of disk space, because the files that are inside Git commits are not ordinary files. They're stored in a special, compressed, Git-ified, and de-duplicated form. So new commit A simply re-uses all the old files from earlier commit α, except for the one modified file, which gets a new snapshot.1

    I then added more commits (B&C) and then realized, that I didn't want the change from commit (A) so I went ahead and did a git reset --hard <HASH> ...

    This kind of git reset is about going to a specific commit.

    Commits, in Git, are laid out a bit like a string of pearls.

    Each commit has a unique but random-looking hash ID, which is just a hexadecimal encoding of a very large number. It's impossible to know the hash ID of some commit unless you have all the data from the commit, or are simply given the hash ID directly, so the way Git handles this is that each commit also stores the hash ID of its immediate predecessor (along with the snapshot, and other metadata). Git then only needs some way to store the hash ID of the last commit in this chain, because each commit points backwards to the previous commit:

    δ <-γ <-β <-α <-A <-B <-C   <-- your-branch
    

    The letters (A, B, and C from your own question, plus some Greek ones in backwards order for earlier commits in the chain) stand in, here, for the random-looking hash IDs. When you made commit A the chain went from:

    δ <-γ <-β <-α   <-- your-branch
    

    to:

    δ <-γ <-β <-α <-A   <-- your-branch
    

    because Git added a new commit A, holding a full snapshot, plus the hash ID of the previous end of the chain. Then, when you made commit B, you got:

    δ <-γ <-β <-α <-A <-B   <-- your-branch
    

    and so on for commit C.

    The git reset command tells Git: Stop remembering the current last-commit using the current branch name. Instead, make this branch name point to the commit I specified. Since you specified commit A, that gave you:

                      B <-C   [no name to find C]
                     /
    δ <-γ <-β <-α <-A   <-- your-branch
    

    Besides the "move the branch name" action of git reset, it also has an effect on Git's index and your working tree, unless you tell it not to. There are other modes for the git reset command that do other things, but let's not wander too far astray; my answers are already too long as is. 😀

    If you want your B and C commits back, they are recoverable, for a while. The main problem with recovering them is finding the hash ID of commit C. (You don't have to find the hash ID of commit B, because commit C holds that hash ID for you. You only have to find the last commit in the chain: Git will find everything earlier automatically, from there.)

    Fortunately, Git has something it calls reflogs, where it keeps the hash IDs that were stored in some name, for a minimum of 30 days by default. You can look at the reflog for HEAD, or the reflog for your branch name, to find the hash ID of commit C. Use git reflog or git reflog branch to do this.

    Assuming you put these two back (with another git reset --hard) so that your strand-of-pearls commits are back to:

    δ <-γ <-β <-α <-A <-B <-C   <-- your-branch
    

    you can now, very easily, add a new commit D to your branch that simply undoes the effect of commit A entirely, using git revert. Or, you can create a new commit D whose snapshot is the same as that of C, except that one particular file's contents are those extracted from any given commit.


    1These individual file snapshots are later further compressed, beyond their initial compression and Git-ization, in what Git calls pack files, which means that even multiple versions of large text files don't take much space in the end. You don't need to care about these details, though: it suffices to remember that each commit acts as a full archive, like a tar or zip or rar archive, of every file.


    Using git revert

    The git revert command works by comparing a commit's content to its parent's content. Whatever changed here, that change is to be un-done in the current set of files by reverse-applying the change, more or less.2 So if you modified file F in commit A, but did nothing to any other file, git revert hash-of-A will back out that one change. Git will make a new commit out of the resulting files.


    2Technically, a revert is a three-way merge, with the current commit being the current commit as always, but with the merge base being the child commit that you specify on your git revert command, and the other commit of the three-way merge being the parent of that parent/child pair.


    Using git restore or git checkout

    Every commit has, as already mentioned, a full snapshot of all your files.

    To get one particular file out of one particular commit, you can use the new (since Git 2.23) git restore command:

    git restore -SW --source <commit> -- <path/to/file>
    

    The -S and -W options, both chosen here, tell git restore to write the replacement file to both the staging area (so that you don't have to git add the file afterward) and your working tree. The default would be to write to just your working tree, requiring a subsequent git add.

    The source commit can be a raw hash ID, or you can use any name that will locate the correct hash ID. If origin/main picks a commit that has the correct copy of the file, you can use --source origin/main. You can abbreviate --source as -s (note lowercase, vs the -S uppercase for staging-area).

    If your Git predates 2.23, you can use:

    git checkout <commit> -- <path/to/file>
    

    which has the same effect as the git restore -SW command (writes to both index / staging-area, and working tree).

    In any case, after using these methods, you will need to make one new commit.

    Using interactive rebase

    Instead of adding a new commit, it's possible to replace a whole series of commits with new and improved substitute commits. The git rebase command is intended for this purpose; when used with --interactive (or -i for short), git rebase is a powerful way to stop using a whole slew of old-and-bad commits. The old commits are not gone: just as with git reset --hard, Git still keeps the old commits for a while (at least 30 days by default). But your Git stops using them, in favor of the new-and-improved commits that git rebase builds before abandoning the old commits.

    Because this does abandon old commits, rebase is not always appropriate. In particular, if other Git repositories have copies of the old commits, it can be very hard to convince every Git repository to switch to the new and improved commits instead. The old, bad commits may keep coming back to haunt you. Adding a revert commit does not have this problem, since Git is built to add commits but not to drop old ones (reset and rebase being exceptions that work in your repository, but don't affect anyone else's).