git rebase pull merge-conflict-resolution

Need clarity with git workflow involving git pull and pull requests

The above diagram gives us a good idea of git pull and git pull --rebase. I'm getting confused about one thing here. Let me elaborate -

1. Case 1 -> git pull --rebase origin master

My local master branch after the command - A B C X Y D' E'

My remote master branch after the command - A B C X Y

If I now execute, git push origin master:master, my remote master branch will look like - A B C X Y D' E'

2. Case 2 -> git pull origin master

My local master branch after the command - A B C D E F

My remote master branch after the command - A B C X Y

How will git push origin master:master behave in this case? I'm not able to understand why in any scenario we would want to use git pull without --rebase?

Solution

What you're missing is, I think, best explained by avoiding git pull. Still, let's imagine git pull had a hypothetical --merge option so that we could say you are running git pull --merge origin master. (You are getting a merge already; this option would be the default, if it were an explicit option.) That is, your git pull origin master runs the equivalent of:

git fetch origin; then
git merge -m "merge branch master of <url>" origin/master.

That produces the graph, which they drew as:

A--B--C--D--E--F   <-- master
       \      /
        X----Y

(I've turned it sideways here. Rotate 90˚ ccw to match.)

I'd now like to suggest redrawing it like this:

        D--E
       /    \
A--B--C      F   <-- master
       \    /
        X--Y

Now that I've drawn the graph this way, which commits are "on" branch master? If you pick A-B-C-D-E-F, why didn't you also pick X-Y? If you pick A-B-C-X-Y-F, why didn't you also pick D-E?

The fact is that all eight commits, including both D-E and X-Y, are "on" branch master. The name master identifies commit F, but commit F is a merge commit. It reaches back to two different commits: E and Y. Those two different commits reach back to D and X respectively, and those two different commits reach back to a single common shared starting point C.

Commit C was the merge base of the two tip commits, at the time you had Git run git merge, via git pull. So Git found out what you did, on the C-to-E leg, by running a diff between the snapshots in commits C and E. Then Git found what they did, on the C-to-Y leg, by running a diff between C and Y. Then Git took the two diffs and combined them, applied the combined result to the shared snapshot from commit C, and used that to make new merge commit F.

Merge commit F has one snapshot, just like every other commit. Where it's different from the other commits is that it has two parents, E and Y. So you can ask Git: *what changed from E to F and what you'll get is the changes brought in because of the lower (in my drawing) leg of the merge; or you can ask what changed from Y to F and you'll see what changes were brought in because of the upper leg of the merge.

In any case, this is the job (and point) of a merge: to combine work, keeping a record of the fact that the work was combined. You can now see exactly what happened: you did something while they were working, they did something while you were working, and then you combined it all at once.

Using rebase makes for a "cleaner" history: it looks like they did something, you waited for them to finish, then you started on your task knowing what they'd done and did your work and committed it. That's not really what happened, but maybe it's just as good. Maybe it's better because to a future you, or them, or whoever, it's simpler: it does not require figuring out whether something went wrong during the work-combining. But if something did go wrong, it may hide what that something was, making things worse for the future you/them/whoever.

This is why you have a choice: one may be better than the other, or not.

[Edit:] What `git push` does

When you run:

git push origin master

or its equivalent but more-explicit variant:

git push origin master:master

your Git will:

use the name origin to find the URL for this git push operation (git config --get remote.origin.pushurl; if that's unset, git config --get remote.origin.url);
call up whatever responds to this URL: that should be the other Git software, hooked up to the other repository;
offer to send them your latest master commit by its hash ID; and
proceed from there.

Let's suppose first that you used rebase, so that your latest master commit hash ID is the hash ID of commit E'. Your Git offers to send this commit to their Git. They have never heard of this hash ID, so they say yes please, send that one, and tell me about its parent(s). Your Git then tells them the hash ID of commit D'; they haven't heard of that one either, so your Git tells them about D's parent Y. At this point they say to your Git: Ah, I have commit Y, you can stop sending things now; package up what I'll need for the commits I asked for, knowing that I have commit Y and every earlier commit.

Alternatively, let's suppose for the moment that you used git merge. Your Git will offer to send commit F (by hash ID). Their Git will say yes to that one, so your Git will now offer to send both parents, E and Y. They will say no thanks to Y because they already have that one, but yes please to E, so your Git will then offer D; they will say yes to that one as well, and your Git will then either offer C, or realize that they have C because they have Y: if your Git does offer C they'll say they don't need it, so this works out the same either way (it's just more efficient if your Git is smarter).

Now that your Git knows which commits to send, and which commits they already have, your Git makes a reasonably minimal thin pack—this technically depends on the chosen push protocol but everyone should be using the "smart" protocol these days—containing the necessary commits and objects, knowing that the other Git repository already has all the objects that go with all the commits they already have. Your Git then sends over this "thin pack" to them, which they save away for further use if all goes well.

Finally, your Git sends a polite request of the form: Please, if it's OK, set your branch name master to ________. Let me know if it was OK. Your Git fills in the blank with the hash ID from your own master. Their Git then checks to see if the new commits add on to their own master branch, without dropping from their master any commits they had before.

Both scenarios—where you ask them to add F, or where you ask them to add E'—do add on, keeping their existing commit Y in their branch, so they probably accept your polite request.

Note that they never know or care what branch name you are using to find these commits. They care only what branch name they are asked to set, to what hash ID, and what the various commits involved have in them.

Need clarity with git workflow involving git pull and pull requests

[Edit:] What git push does

[Edit:] What `git push` does