git rebase / squash - why do I need to resolve conflicts again?

I really tried looking through similar topics but I just can't seem to get a grip of the concept.

I forked a repo, made some changes. Overtime, I also used git fetch upstream and git merge upstream/<branch> a few times. Some time, there are conflicts which I resolved and retested.

Now when it comes to pushing the changes upstream, I want to make a single commit. The instruction that I was given is to use git fetch upstream and git rebase -i upstream/<branch>.

What I don't understand is that I am stuck to deal with conflict resolutions once again. I don't understand why I need to resolve conflict when my fork is current with its origin. I could have made a backup of all my modified files, nuke my fork, fork again, restore the backup and I would have no conflicts to resolve and ready to commit. This process seems mechanical enough that I don't understand why I have to go the hard way (resolve conflicts again).

Could someone help me understand?

Solution

The best answer to "why do you need to re-resolve everything" is "you don't". But this means abandoning the instructions you were given (it's not clear to me who gave you these instructions).

As ElpieKay commented, you can use git rerere to automate the reuse of previously recorded resolutions. In this way, you are re-resolving everything, but getting Git to do it, instead of having to do it by hand.

If your intent, though, is to make a single new "squash merge" commit (i.e., not-a-merge, just an ordinary commit) atop the current upstream tip to make a pull request from it, there is no need to go through any of this. You can instead just do the following sequence of commands (note assumptions below):

git fetch upstream
git checkout -b for-pull-request upstream/branch
git merge --squash branch
git commit
git push origin -u for-pull-request

and then make the pull request using the clicky web buttons on the web service for origin.

Eventually, they accept your pull request, and at this point you will delete branch, abandoning all this work in favor of the new single-commit that they accepted. (See the end for how to rename/reset instead.) Or, they don't accept it, at which point you can delete for-pull-request, and can continue working on branch and eventually repeat the process.

This assumes:

Your fork consists of two repositories: your repository—the one on your own computer / laptop / whatever—and another repository on a web service provider like GitHub.
Their (upstream's) fork is on the same web service provider.
On your computer, in your repository, you use the short name upstream to refer to the upstream repository (their fork), and the short name origin to refer to your own fork as stored on the web service provider. The web provider offers the clicky buttons that make pull requests.
Perhaps most important, this assumes that you are willing to abandon that local branch in favor of the upstream's accepted pull request, if and when that occurs.

There are several keys to understanding all of this, most of which are tied into the way Git's commit graph works. The remainder have to do with the (single) magic trick by which Git distributes repositories, and what the various commands—git merge, git rebase, and git merge --squash—actually do, and what git fetch and git push really do.

I don't really want to write another giant article on this (... too late :-) ), so let's summarize by noting these points. They are not in a great order, but I'm not sure there is a great order: there are a lot of cross-referential items.

Git normally just adds new commits. Adding a new commit causes the current branch name—as stored in HEAD—to point to the new commit, while the new commit points back to whatever was the HEAD commit.
But git rebase works by copying existing commits to new commits, then abandoning the original commits. It does the copying with a "detached HEAD", where HEAD points directly to commits, rather than using a branch name. ("Add new commit" still works as usual, except that it just updates HEAD directly, rather than the branch whose name isn't stored in HEAD any more.) The commits that are copied exclude any merge commits. In a way, this means any conflict resolution you did then is largely lost (unless you have git rerere enabled). (It's actually lost for different reasons, which I have not figured out how to explain well.)
Each copy is made much as if by git cherry-pick, sometimes by literally running git cherry-pick.
Each cherry-pick can result in merge conflicts, because cherry-picking a commit runs the merge machinery, merge as a verb as I like to call it.

The merge base of this operation is the parent of the commit being cherry-picked, even if that's not a very sensible merge base. The --ours commit is the current or HEAD commit, as always, and the other commit—the --theirs commit—is the commit being copied / cherry-picked. The new commit, at the end of the successful cherry-pick, is made as an ordinary, non-merge commit.
Running git merge, by contrast, is sort of less complicated, except that there are various separate cases.

Sometimes Git sees that no merge is required after all, that it's possible to do a fast forward operation: change the commit to which the current branch points, so that the branch points to the target commit.

Sometimes a merge is required, or (using --no-ff) you tell Git not to do a fast-forward even if it could. In this case, Git uses the merge machinery to do the merge. The merge base for this merge is the actual merge base of the two tip commits. As with all merges, there may be a merge conflict here. The final commit made at the end is a merge commit: merge as an adjective or noun.

The presence of a merge commit in the commit graph means that a future merge will find a new, better merge base. This will avoid having to re-resolve conflicts.
Running git merge --squash is yet another special case. Like an actual merge, it uses the normal merge machinery to compute the true merge base of the two branch tips involved.

The two commits to be merged are, of course, HEAD (--ours) as always, and the commit you name on the command line. Since the merge base is the true merge base, which uses existing merge-as-a-noun merges (merge commits) to minimize new merge conflicts, you can get a relatively conflict free merge result.

The final commit, however, harks back to the cherry-pick idea: the final commit is not a merge commit. It's just an ordinary commit. The tree is that computed by merge-as-a-verb, but the commit is a regular commit, not a merge-as-a-noun. (For no particularly good reason, the --squash flag turns on the --no-commit flag as well, forcing you to run git commit yourself. There probably was a reason once—probably it was most convenient for whoever wrote the initial --squash code to just exit early—but today there's no reason for this.)

We add to this these facts:

Your fork was, originally, a clone of some other repository. So it started out with the same commits (the same history) that upstream had. Since then, you and they have diverged a bit, but meanwhile you update your fork by git push-ing commits from your own local repository.
When you git fetch upstream, you pick up their commits from their fork, putting them into your own local repository. The tracking names for these are upstream/master and so on.

When you git fetch origin (if you ever need to), you pick up commits from your fork, putting them into your own local repository. The tracking names for these are origin/master and so on.

Your repository has your own branches (of course).

This means your local repository—the one on your own computer—has the complete union of "their fork", "your fork", and "your commits". You can, at any time, push your own commits back to your own fork.
You can easily make "pull requests" using your fork, because your provider (e.g., GitHub) remembers for you a link between "your fork" and "their fork".

So, let's draw the—or a simplified version of the—commit graph that you have in your repository, as a result of this:

I forked a repo, made some changes. Overtime, I also used git fetch upstream and git merge upstream/<branch> a few times. Some time, there are conflicts which I resolved and retested.

You have:

...--o--*--o...o...o--T    <-- upstream/branch
         \      \
          A--B---M---C   <-- branch (HEAD), origin/branch

Here, commit * is the common base commit from which you and upstream started. You made some commits, such as A through C, on your own branch. You also made at least one merge commit M. I'm assuming you ran git push origin branch at various points, so that origin/branch in your own repository records your fork's branch name branch as pointing to your tip commit C. (It doesn't really matter if it does or not, since we are not using it below.)

If you were to, now, run git rebase -i upstream/branch, this would list commits A, B, and C, but not commit M, as commits to copy ("pick"). The target for the copies would be commit T, which is the tip of upstream/branch, which your Git is remembering from the branch branch on upstream.

If you put up with redoing all the merge conflicts, and copied the three commits to three new commits, you would get:

                        A'-B'-C'   <-- branch (HEAD)
                       /
...--o--*--o...o...o--T    <-- upstream/branch
         \      \
          A--B---M---C   <-- origin/branch

You could then push (or force-push) this to origin, or you could now (without merge conflicts this time) collapse the A'-B'-C' sequence down to a single "squashed" commit S:

                        S   <-- branch (HEAD)
                       /
...--o--o--o...o...o--T    <-- upstream/branch
         \      \
          A--B---M---C   <-- origin/branch

and again push or force-push this to your fork origin as branch-name branch.

(Note that I stopped marking the base commit * once it was no longer an interesting base commit. It was particularly interesting when we were thinking of running git rebase -i upstream/branch, since it was the point where branch and upstream/branch rejoined permanently. This determines the set of commits that have to be copied for a git rebase operation.)

But what if, instead, we create a new branch named for-pull-request, pointing to commit T, and check it out:

...--o--o--o...*...o--T    <-- for-pull-request (HEAD), upstream/branch
         \      \
          A--B---M---C   <-- branch, origin/branch

Now we run git merge --squash branch. This invokes the merge machinery, using current commit T as HEAD and the other commit being C. We'll find the merge base, which is now the commit I marked *: it's the first (or "lowest") common ancestor, rather than the last disjoint ancestor A that rebase uses. That is, starting from both commit C and commit T and working backwards, Git finds the commit "nearest to" both branch tips, which is the commit you last merged.

This means the only conflicts you will see is anything you did in C (after the last merge M) that conflicts with anything they did since *.

When git merge --squash branch has finished using the merge machinery to merge commits T and C using this * as the merge base, it stops and makes you run git commit manually. When you do that, you get a new ordinary commit, which we can call S for Squash:

                        S   <-- for-pull-request (HEAD)
                       /
...--o--o--o...o...o--T    <-- upstream/branch
         \      \
          A--B---M---C   <-- branch, origin/branch

Now, except for the fact that this is named for-pull-request, this is the same graph we would get if we did the git rebase -i upstream/branch that were in those instructions you were given. Moreover, if we resolve any merge conflicts correctly, commit S has the same source we'd get the other way—but we'll have far fewer, if any, merge conflicts to resolve in the first place.

You can now push this name (and commit S) to your fork on your provider (GitHub?), and do the merge request. If they accept it, you can delete your branch branch, then create a new branch branch pointing to S (or its merge in upstream, or if they squash-merged, their S' that's basically the same as S but has a different hash ID and different committer: in any case you'll have to git fetch upstream again to get their commits).

Instead of delete-and-re-create, you can simply force your branch name branch to point to the newest commit. Let's say they squash-merge your squash into their fork, so that they have new commit S' that's a copy of S, and you git fetch that:

                        S   <-- for-pull-request (HEAD), origin/for-pull-request
                       /
...--o--o--o...o...o--T--S'  <-- upstream/branch
         \      \
          A--B---M---C   <-- branch, origin/branch

You can now run git branch -f branch upstream/branch, or git checkout branch && git reset --hard upstream/branch, to get your Git to have this:

                        S   <-- for-pull-request, origin/for-pull-request
                       /
...--o--o--o...o...o--T--S'  <-- branch, upstream/branch
         \      \
          A--B---M---C   <-- origin/branch

(which of these branches is HEAD depends on which command-sequence you use).

Once you git push --force origin branch to send the updated branch to your fork on your provider and delete for-pull-request (in both your repo and your fork), you will effectively abandon the origin/branch series of commits, giving:

                        S   [abandoned]
                       /
...--o--o--o...o...o--T--S'  <-- branch, origin/branch, upstream/branch
         \      \
          A--B---M---C   [abandoned]

and if we stop drawing all the abandoned commits, everything looks clean and neat, which is probably the point of all of this. :-)