I want to generate a patch file of the differences between my branch and master. But the branch is quite long-lived so I just did a merge from master to bring it up to date. I can see the differences fine if I start creating a pull request in Bitbucket. But when I do git diff master..
on my branch I see differences shown that aren't there. Are they resulting from the merge? How can I get a list of differences the same as Bitbucket - just the differences between my branch and master right now?
It's not clear to me quite where your confusion starts, but it's worth noting that using git diff
is quite different from generating a pull request. Eventually, it will boil down to running git diff
on the correct specific commits. The trick lies in finding the right commits.
First, remember what it is that Git keeps. At a sort of fundamental level, what Git cares about are source snapshots, saved in the form of commits. A commit contains a complete snapshot of some source tree. A commit also contains some metadata: name and email addresses of the person, or sometimes two people, who made the commit (author and committer: they may be the same, or separate) and time-stamps for when they made it; a parent commit ID, so that Git can present the series of commits as a history of who (author) did what (see below), and when (timestamp); and a log message, to provide the author's description of why they did what they did.
Since each commit is a full snapshot, in order to see who actually did what, we must use a command like git diff
. Suppose we have two commits done in succession, on branch master
, like this:
(parent) (child)
df731ac <- 049a12b <-- master
A branch name like master
lets us find the most recent commit 049a12b
. We use the child's stored parent ID df731ac
to find the parent, and then we can run git diff df731ac 049a12b
—or much more simply, git show master
—to compare df731ac
to 049a12b
.
Whatever comes up as different here, the author of 049a12b
must have changed it. But df731ac
(the predecessor or parent commit) is a complete snapshot, and 049a12b
(the successor or child commit that is the tip of branch master
) is also a complete snapshot. Knowing this is helpful for understanding the next part.
Note that, as in the drawing above, a branch name like master
or develop
or feature/tall
simply contains the ID of one specific commit. We call this commit the tip commit of the branch. When you add new commits to a branch, what Git does is create the new commit, which gives it an ID, and then write the new tip commit ID into the branch name. The branch names therefore "move" over time: they always point to the latest (child-most) commit. Each new commit has, as its parent, the ID that was the tip of the branch before, which lets Git follow these backwards pointers through the repository.
If Git commit hash IDs were just one letter, we could draw a simple three-commit repository as:
A <-B <-C <-- master
and adding a new commit would simply consist of writing commit D
with C
as its parent, and making master
point to D
:
A--B--C--D <-- master
The special name HEAD
normally contains the name of a branch. So if HEAD
contains master
, Git can use HEAD
to select branch master
, and master
to find D
. In other words, Git typically starts by using a branch name to get a tip commit ID. Then it looks at that commit to get its parent ID, then looks at the parent commit for another parent, and so forth. This is what branch names are for, and what they do: they find tip commits.
git diff
All git diff
does (most of the time anyway1) is to take any two individual commits like this and compare them. To do this it needs to resolve its two inputs to hash IDs. Those hash IDs are the two commits; it then compares the two snapshots.
When you run git diff master..
, Git's diff
translates master..
into master
and HEAD
(the default to fill in an for empty position around ..
is HEAD
), and then translates master
into a branch tip ID. If the tip commit of branch master
is 049a12b
as in the drawing above, the hash ID for the left half of the comparison will be 049a12b
. For the right half, git diff
must read HEAD
to get its branch name, such as develop
or feature/tall
or whatever. That branch name then maps to its own tip commit. Let's say it's abbreviated ID is 6bc9702
. Then this git diff
command ultimately tells Git to extract the source snapshot for 049a12b
, the one for 6bc9702
, and compare those two.
You can, however, supply any two hashes for any two commits that you have:
git diff 0123456 fedcba9
for instance. But you have to find those commits, or some name that Git will turn into those commits.
(It doesn't matter if you say git diff A B
or git diff A..B
; these mean exactly the same thing. This is different from git log
and most other Git commands: only git diff
has this special handling for the two-dot ..
syntax. However, the rule that fills in HEAD
if one of the names is missing, is common to git diff
and other Git commands.)
1Git's git diff
can produce something called a combined diff but these are rather complicated, and not relevant here.
git show
and git log -p
I mentioned git show
above. What git show
does is to find the parent commit automatically for you, and then show you first the metadata—the author (name, email, timestamp) and the log message—and then a diff from parent to child.
When you run git log -p
, this is similar to running git show
on each commit, starting from the child-most and working backwards (note that git log
defaults to starting from HEAD
). That is, first git log
shows you the current branch's tip commit as if by git show HEAD
, then it shows you that commit's parent as if by git show
, then it shows you the parent's parent as if by git show
, and so on.
There is one fairly big difference: git show
will invoke the special combined diff machinery on any merge commits, while git log
will just show the log message by default, skipping any attempt at diffing the merge. (There are flags you can use to change this behavior.)
Pull requests are more complicated, because in order to make a pull request, you must either open your repository to someone else who can run git pull
2—this is where the term comes from, and is the original meaning of pull request—or else find or create a shared repository, push some of your commits to this shared location, and then ask the other person to obtain your commits from the shared location. I'll ignore the original meaning of "pull request"—essentially just an email message asking someone else to run git fetch
—and jump into the way these sites handle it instead.
With services like GitHub and Bitbucket, there are now at least two other repositories involved. They even run a trial merge (though this is not so important, other than to verify that the pull request makes sense). I'm more familiar with GitHub than Bitbucket (I use GitHub myself), but both work the same way here, at least from a sufficiently high level view.
Before you can even think about pull requests, you must "fork" a repository. A fork is a clone, but with some extra memory about which repository it was cloned from.3 Behind the scenes, in a way that you normally don't have to care about,4 the provider does a lot of storage-sharing so that each fork takes very little space on the provider's servers.
This forking, though, is why there are two extra repositories involved. This gives us three repositories we must keep track of:
git diff
, too.origin
remote.The original repository. This does not necessarily have any name in your repository. You can—and perhaps should—add another remote, which in other examples is called upstream
. It's not always required that you add this, but let's assume you did. If you have not, run:
git remote add upstream <url>
where is the URL of the repository you forked your origin
repository from.
We'll refer, below, to your repository, your origin
, and your upstream
. Remember that these remote names are actually just short names in your own repository referring to another Git at some URL. That's what a remote is: a short name for a URL where there is a Git repository at that URL. We'll use the word provider to mean GitHub or Bitbucket.
2The git pull
command is meant as a short-cut for doing git fetch
followed by a second Git command, all with one command. As it turns out, it's often important to use the two commands separately—not always, but often enough that combining them like this was probably a mistake. Probably, the command now named git fetch
should have been named git pull
, and the one now named git pull
could be options you pass to git fetch
, or a pair of convenience shortcut commands: git fm
for fetch-and-merge, and git fr
for fetch-and-rebase. I recommend that new Git users avoid git pull
in favor of the separate commands, at least until they are quite familiar with the separate commands. Nonetheless, this slight historical error is fully baked into Git today, not only in terms of git pull
being the obvious (but incorrect) opposite of git push
, but also in the very name "pull request".
3This is over and above—or maybe "beside" is a better description—the way that clones remember their origin through the remote name origin
. In any case forks are more like mirror clones initially, but are not slaved to the repository from which they are forked like mirror clones would be. So they're kind of a hybrid, with extra features—including, specifically, that you can make the service's version of a pull request.
4GitHub occasionally brings this up if and when you delete forks vs deleting unforked repositories, since (a) they have to undo the special fork sharing, and (b) deleting forks is safer in that the original (from which you forked) repository is still around. I imagine Bitbucket is similar.
git push
The main thing to know about git push
is that it pushes commits, not files. It does this by calling up some other Git repository. Then it finds out what commits you have that they don't, gives them your commits, and asks them to set some name(s), usually branch names, to remember specific commits.
Now, your fork at origin
belongs to you, so you can git push
to it however and whenever you like. It's a real, actual Git repository (or something that acts just like one), stored on the provider's machines rather than your own, but it's just like your own Git repository in that it has commits, and branch names, and those branch names point to tip commits that point back to previous commits.
When you run git push
, your request to set a branch name, like master
or develop
or feature/tall
, comes with a commit hash ID. If their Git doesn't have that commit, your Git gives their Git that commit. If their Git doesn't have that commit's parent, your Git gives their Git the parent, too. This continues on until you reach some commit their Git does have. Those are what you both shared before you started the git push
.
The commit hash ID you give them is normally the one at the tip of your branch. So if you have:
...--H--I--J <-- master
and you git push origin master
, you are getting your Git to call up their Git and say "I'd like you to set your master
to commit J
". If their Git has their master
pointing to commit H
, and is missing I
and J
, your Git gives them I
and J
, too.
It's possible that their Git has their branch name pointing to some commit you don't have, or that isn't in the chain formed by starting from your branch. Maybe their Git has:
...--H--K <-- master
If so, your request, that they add I
and J
and make their master remember J
, will be denied by default, because this would result in:
K [abandoned]
/
...--H--I--J <-- master
after which they will "lose" commit K
, possibly for real and forever. Since the Git at origin
belongs to you, though, you can normally use a force push (git push --force
) to turn your polite request into a command: yes, set your master
to J
even though that loses K
! (Usually this is a bad idea and you shouldn't do it. Instead, you should git fetch origin
to bring K
into your own repository, and then either merge or rebase to incorporate K
along with your own I--J
. This gives you a new and different commit, or set of commits, that you can push politely, that won't lose K
. Instead, they will be pure additions of new commits.)
Note that these changes—usually pure additions of new commits followed by moving a branch name "forward"—go into your fork. They affect your origin
, but they do not affect your upstream
. That's not your repository after all! You cannot push directly to your upstream.
Instead, what you can do, now that your new commits are in your origin
which is a fork of your upstream
, is to make a pull request, typically using some web interface clicky button. The provider's server will know—you will tell it, if and as necessary—which branch name you want to use in your origin
, and which branch name you want to use in your upstream
.
The provider will then notify whoever actually controls the upstream that you have made this pull request. Since the provider has your fork—your origin
—specially shared with their repository that is your upstream
, they will have direct access to the commits you pushed to your branch, that are now at your origin
's branch tip.
Now we have all the tools we need to find the correct diff. We want to compare their branch tip commit, from on the branch name you picked out when you made the pull request, to the tip commit in your upstream
branch that you set when you ran git push
. If you have those two hash IDs in front of you, you can run git diff <their-upstream-tip-hash> <your-origin-tip-hash>
.
But hash IDs are terribly ugly. It would be nice if we could get Git to translate for us—and we can. I skipped over how git fetch
works above, but let's dive into it for a moment.
git fetch
If you run git fetch upstream
, that tells your Git to call up the Git that answers at the URL you stored under upstream
. That's the Git for the upstream repository at your provider, the one you forked-from. Your Git will call up that Git, obtain any new commits they have that you don't, and drop them into your repository. Then—here's the key trick—your Git will set your remote-tracking branch names for upstream
to record the hash IDs for each of their branch tips, per whatever they have right now.
Their master
becomes your upstream/master
. Their feature/tall
becomes your upstream/feature/tall
. Your Git remembers these for you, along with picking up any new commits they have.
The same holds when you run git fetch origin
: your Git calls up the other Git at origin
—this is your fork at the provider—and loads up any commits origin
has that you don't. Then your Git sets your origin/master
to remember the master
at your origin
, and so on. Note that when you git push
to origin
and give them updates, your Git knows if they take the updates. If they do accept your updates, your Git records the new hash IDs under origin/master
, origin/develop
, and so on.
Hence, as long as your Git is in sync with the two Gits at upstream
and origin
—and if it isn't you can just run git fetch
to upstream
and to origin
to update it—you now have in your own repository the correct commits, named via upstream/theirbranch
and origin/yourbranch
. So, instead of git diff <magic hash 1> <magic hash 2>
, if you've sent a pull request asking your upstream to incorporate your feature/tall
into their develop
, you can git diff upstream/develop origin/feature/tall
.
The two commits you need to diff are those in two other repositories. If those two repositories are set up as remotes upstream
and origin
in your own repository, and your repository is up-to-date with respect to those two repositories, you can git diff
or git log
or git show
the commits in question, and use your remote-tracking names upstream/*
and origin/*
to locate specific branch tips.
You can have commits that aren't in either of these repositories, and you can see what would happen if you pushed these new commits to your own origin
. This allows you to see what would happen if you pushed them and then made a pull request: just compare your upstream/*
remote-tracking name tip commits to your own branch tip commits.