When I enter git merge "my-most-up-to-date-branch"
I get the following error:
Problem is that I don't know where to start, at all. There are differences in .py files which I can edit by hand. But, also ones that can't fixed by hand like .db , .pyc files.
This is what I get when I enter git mergetool
:
Also, I am not sure whether I understand what happens if I enter "m" or "d" in the second picture. Project's most up-to-date version is in the branch "reset-password". How do I solve this issue?
ps: I am sure there is a way to handle this without using merge and making "reset-password" my new "master" branch. However, I really do want to be able to handle this problem by using merge so that I can be able to handle similar problems in my future professional life.
When you run git merge name
, there are multiple possible outcomes:
Git finds nothing suitable to merge and complains and never even starts the merge. That's not the case here.
Git starts the operation, and is able to complete it on its own, because one of two things is true:
In these cases, the merge is finished, and you can go on and do more things in Git. That's also not the case here.
Git starts the operation, but is unable to complete it. You're left with a mess.
Case 3 had already happened at some point, before you ran git merge reset-password
. Once case 3 has happened, you must clean up the mess before you can proceed. Running git merge
again gets you the output you showed in your first image.
(Note: you can get stuck in this same situation with git cherry-pick
and git revert
or any command that invokes these. Since git rebase
is performed by repeated cherry-pick operations, those too can leave you with merge conflicts. I am just guessing here that you had an earlier merge that never completed, based on the >M<
in your shell prompt. It seems that most setups use >R<
or >R
for an incomplete rebase.)
Problem is that I don't know where to start, at all ...
Well, you sort of do, because you've tried git mergetool
. But this is jumping right into the deep end of Git. Unfortunately, pretty much all approaches involve jumping into the deep end, here. 😅 Git sort of forces you to learn about the mess it leaves behind, and it's not simple.
You may already know some of this, but it's worth at least a quick scan just in case. First, Git is really all about commits. Commits are numbered, but Git (and you) will generally find these numbers by branch names, because of these facts about commits:
Commits are numbered, but the numbers are big and ugly. They look like, for instance, e1cfff676549cdcd702cbac105468723ef2722f4
. These numbers might seem random, but in fact, they are cryptographic checksums of the contents of some internal Git object. Each Git commit is a unique internal object, and thus gets a unique checksum.1
Commits contain two things: a full snapshot of every source file that Git knew about at the time you (or whoever) made the commit, and some metadata. The metadata include stuff like the name and email address of whoever made the commit. Crucially for Git, the metadata also include the raw hash ID of the previous commit, or for a merge commit, two or more previous commits. Git calls the previous commit the parent (and by implication, the commit itself is therefore a child of that parent).
The fact that the hash IDs are checksums means nothing about any commit can ever be changed. All commits are completely read-only. The files inside each commit are also read-only; to save space, they're compressed and stored in a Git-only format, with de-duplication.
There are some important consequences of these three points that we'll go over extremely fast here:
Commits form chains. Since it's the child commit that holds the parent's hash ID—it has to be; the child's hash ID isn't predictable when we make the parent—these chains point backwards.
A branch name simply holds the hash ID of the last commit in some chain. However, being the last commit in some chain doesn't mean there cannot be more commits after this point: another branch name can point to a later commit.
Many commits are on multiple branches. The very first commit in a repository, which has no parent because there's no commit before it, is pretty commonly on every branch. (The only way for it not to be on every branch is to have more than one of these "first" or root commits. We won't look at how this can come about, here.)
Because the files inside a commit are read-only, the files that you work on (or with) are not in a commit. In an important sense, they are not in the repository itself at all.
The sections below aren't about merging at all, yet. We'll get to that in a larger heading, in a bit.
1Pay no attention to the pigeonhole principle here, or see How does the newly found SHA-1 collision affect Git?
Let's expand a bit on that last bullet point. To get any actual work done, you need to get files out of some commit. Git will do this by extracting the frozen, compressed, and de-duplicated files (which sometimes aren't normal OS files at all, and which all have hash-ID names internally) into regular everyday files, putting those into a work area. This work area is not inside the repository.2
Git calls this work area your working tree or work-tree. Since this area is yours, you can create other files and directories/folders here, if you like. The files that Git knows about are, at least initially, those that Git just extracted from some existing commit. If you use the OS to create additional files, Git doesn't know about them, although Git will normally take care not to clobber them by accident.3
Pretty much all version control systems work like this: there are committed files, which are saved for all time,4 and some more-temporary ones that you can actually work on/with. This part, most people don't find confusing at all. Most other version control systems stop here, but Git being Git, it doesn't.
2The repository itself is typically stored in a hidden .git
directory at the top level of the work area. That is, the repository is in the work-tree, rather than the other way around! This is not always a sensible arrangement, and submodule repositories are normally moved out of the way, in modern Git, lest this part of your work-tree get removed.
3Listing a file in a .gitignore
sometimes gives Git permission to clobber it, and some Git commands, such as git clean
, are supposed to destroy such files anyway. So this is not a total guarantee of safety. But in general, you can create files in your work-tree, and not have Git ruin them. You'll see complaints from Git now and then about some work-tree file being in the way of a git checkout
or git merge
operation: Git is just telling you Hey, I found this file of yours, and if I overwrite it now, from a committed file, I'll be clobbering your data, so maybe you should move it out of the way first.
4Or saved for as long as you don't tell the system to forget that commit, or whatever. The details of this vary, quite a lot, from one version control system to another.
In other version control systems (VCSes), you check out some commit, and now you have a bunch of useful files. You make changes to those files, and when you are ready, you tell the VCS: commit these. It goes and finds what you did, and commits that. Some of these VCSes can be excruciatingly slow here. Git tends to be blazing-fast. It gets this speed at a price. That price has useful (to you) side benefits, but it's definitely confusing, and it's time to learn all about it.
Instead of just having two copies of each file, Git stores three. One of the three is the frozen (and de-duplicated) file in the current commit. You picked out some commit to work on, so that commit—with its big ugly hash ID—is the current commit, and that commit has a snapshot of all files.
At the opposite end, as it were, Git has copied all of those files out of the commit, into your work-tree. These are ordinary everyday files that you can do anything with.
Between these two copies, though, Git keeps a third "copy". The word "copy" is in quotes here because this third one is in the frozen form, and is pre-de-duplicated. Initially, all of these match the copies in the commit. This extra copy lives in something that Git calls the index, or the staging area, or sometimes—rarely these days—the cache. All three of these names are for the same thing. It has three names perhaps because index doesn't mean anything, and cache is too specific: the name staging area reflects its role.
When you go to make a new commit, Git uses the ready-to-go files that are in Git's index. Since they're in the right format, Git can make a new commit very quickly. But this means that if you change your copy in your work-tree, you have to tell Git to replace the index copy. The copy in the index is in the frozen format but isn't in a commit and therefore is not actually frozen.
The git add
command is how you do all this. What git add file
really means is make the index copy of file
match the work-tree copy. Git will replace the old index copy with a new one, compressing and de-duplicating the file at git add
time, to make it ready to be committed. This means that instead of git commit
being slow, it's git add
that's slow—but you only have to do it on files that you changed, so it's not really that slow.
All of this, in turn, means that what's in the index—or staging area—is, in effect, your proposed next commit. Git has filled it in from the current commit, when Git extracted that commit. Git copied the commit to both Git's index and your work-tree. Now that you've changed stuff, or maybe added or even removed some files, you must update Git's index to match. You do this with git add
, or git rm
if you want to just remove stuff. This updates Git's index, and hence your proposed next commit.
Before we move on to how merge works, let's take a moment to observe the process of making regular everyday non-merge commits—commits with just one parent, in other words. We start with a simple linear chain of commits, ending with some particular last commit with a hash ID:
... <-F <-G <-H <-- somebranch (HEAD)
Here H
stands in for the actual hash ID of the last commit in the chain. Commit H
holds a snapshot and metadata. Git can find the commit by its hash ID, and the hash ID is in the name somebranch
. In the metadata for commit H
, Git can find the hash ID of earlier—parent—commit G
, so using somebranch
to find H
lets Git find G
. Commit G
of course has a snapshot and metadata, and the metadata include the hash ID of its parent F
. This has a snapshot and a hash ID again. So given just the branch name, Git can find all the commits.
Let's make a second branch name that points to the same commit:
...--F--G--H <-- somebranch (HEAD), anotherbranch
We're still using commit H
. The (HEAD)
here tells us that we're using the name somebranch
to find commit H
. If you git checkout anotherbranch
, we'll start using the name anotherbranch
instead, but still find commit H
:
...--F--G--H <-- somebranch, anotherbranch (HEAD)
If you now modify some files and git add
them to put the updated files into Git's index, you can now run git commit
to make a new commit. Git will:
I
.There's one more step, but let's draw commit I
now:
...--F--G--H
\
I
Now let's add the branch names, after we note that the last step for git commit
is that Git writes the new hash ID into the current branch name—the one with the attached HEAD
:
...--F--G--H <-- somebranch
\
I <-- anotherbranch (HEAD)
Now commits up through H
are on both branches, and new commit I
is only on anotherbranch
.
We are now ready to tackle Git's merge operation. Let's consider, first, these facts:
We start with a situation like this:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
That is, the name branch1
selects some commit—which we'll call J
—and the name branch2
selects some other commit that we'll call L
. The one we are using right now is J
: that's what's in Git's index and in our work-tree.
When we run git merge branch2
, Git uses HEAD
to locate our commit J
, and uses the name branch2
that we gave as an argument to locate their commit L
. But now Git needs to figure out what we changed and what they changed. That means Git has to find some earlier commit.
The right earlier commit is not always obvious, but what Git needs is a commit that is on both branches. Commit H
is on both branches; so is commit G
, and anything earlier. It kind of stands to reason, though, that the best commit is probably the one "closest to the ends": that is, commit H
is "better" than commit G
, because comparing the snapshot in H
against either later commit will probably find fewer changes than comparing the snapshot in G
, or anything earlier.
We call this "right commit" the merge base, and in any case, Git finds the merge base on its own here. In this easy case, it's easy to see that Git will pick commit H
. In more complex graphs, using git merge-base --all
may be the only sane way to see what Git is picking.5
To find what we changed, Git now runs, in effect:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed
A very similar command finds what they changed:
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed
Git also, at this point, actually reads all three commits into the index.
5Git uses a lowest common ancestor algorithm here. When applied to a Directed Acyclic Graph, there may be more than one LCA. The --all
to git merge-base --all
tells this command to print out all LCAs. Different merge strategies may use just one merge base, or all of them; we won't go into the details here.
Earlier we saw that the index had a copy of each file. This is the normal state for the index, but during a merge, the index actually expands. Instead of one copy, it holds three:
That is, if the merge base, our commit, and their commit all have a README.md
file, the index now has three README.md
files in it. We can name these using a digit and some colons, with some Git commands,6 e.g.:
git show :1:README.md # view the merge base copy
git show :2:README.md # view the `--ours` copy
This repeats for every file in the three commits. Some of the commits might not have all the file names, and the git diff --find-renames
above might find that, from commit H
to L
, they renamed some file, for instance; in this case the index entries are a little trickier. Or perhaps we or they deleted a file, or added a whole new file, in which case there's no slot-1 entry but there is a slot-2 or slot-3 entry. You have these cases, so we can't ignore them. But they're a little more complicated, so for now, we will ignore them. The rest is pretty straightforward:
If Git was able, from the above, to figure out which version to use, Git just moves that version from these nonzero numbered slots to slot number zero, and erases the higher-numbered slots. A slot-zero entry is the normal "this is the file, ready to be committed" copy. So that file is now resolved. Git puts the chosen copy into your work-tree as well.
If not, Git goes on to try a low-level merge of the file.7
6Most Git commands that can take a hash ID can take names that Git resolves into a hash ID. This resolution is done through the rules outlined in the gitrevisions documentation. So git rev-parse :1:README.md
prints out the internal blob hash ID for that file. When using git show
or git cat-file -p
, you can give it either the hash ID, or the name; they'll run the name through an internal rev-parse as needed.
7You can specify a merge driver instead of letting Git use its built in one. This also gets somewhat complicated.
Suppose that we have three different versions of run.py
in the index, and the diff from base to ours says to make a change to line 42, while the diff from base to theirs says to make a different change to line 54. Git will simply take both changes and apply them to the merge base copy of the file.
If we and they changed the same line(s), Git will compare what we both used as the new replacement(s) for them. If our replacements match, Git will take one copy of this change.
If we and they changed the same lines but to different text, Git will declare a merge conflict in this file, and will arrange for the merge to stop in the middle. The extended (-X
) options can tell Git not to stop after all (by telling it to favor ours or theirs), but we'll skip over these.
If there are no merge conflicts after combining our changes and their changes, Git will, as usual, put the result into index slot zero and your work-tree. This file is also resolved.
If Git isn't able to resolve the conflict, the low level merge code will write its best-effort at merging the three files to your work-tree. (What happens to Git's index, well, we'll leave that for the next section.) The work-tree file will use the combined changes wherever they didn't conflict, and where they did, will contain lines from both "sides" of the merge. If you set merge.conflictStyle
to diff3
, the conflicted region will include the corresponding lines from the merge-base version of the file. I like to set this option always; I find the resulting conflicts easier to read.
In the section above, I talked about how Git handles conflicts within the three versions of some file, where there is a merge base copy, the --ours
copy, and the --theirs
copy, and all three differ. But let's see what happens with these cases:
Suppose they delete a file and we don't do anything to it. What should Git do with this? Git's answer is take the deletion: Git keeps the file deleted in the merge result, by emptying out all index slots, including slot zero, and making sure that the file isn't there in your work-tree.
Suppose we delete a file and they don't do anything to it. Git handles this the same way.
Suppose we, or they, delete a file, and they or we—the other side—modify the file. Git's answer is to declare a merge conflict and just leave two of the three copies in the index. Git calls this a modify/delete conflict.
Suppose we rename a file (without changing its content), and they don't change it, or do change it but don't rename it. Git's answer is to combine both changes: take their changes if any, and use our new file name. The same applies of they rename it and we don't. If we both modify the file, and the low level code can combine the content changes, Git resolves the file by taking both the rename and the combined content change.
If we both rename the file, but to different new names, Git calls this a rename/rename conflict.
If we both create all-new files, with different content but the same name, Git calls this an add/add conflict.
These conflicts that involve file names or entire file creation/deletion are all high level or tree conflicts, because they don't involve low-level content conflicts. We can even get both high and low level conflicts, e.g., with a rename/rename conflict plus a low level conflict; but the main point here is that if we do get one of these high level conflicts, the extended (-X ours
and -X theirs
) options have no effect: those options are only handled by the low-level merge code.8
In any case, if Git does stop with a merge conflict, it leaves the nonzero slot number entries in its index. This leaves the two or three input files available to commands like git mergetool
, and leaves enough traces for git mergetool
to diagnose high level conflicts such as modify/delete conflicts.
8There may, in the future, be some fancier high level conflict handlers that do allow some -X
options. But today there aren't.
We now know what kind of mess Git leaves behind:
Your job is to finish the merge. You may do this any way you like.
You don't have to use the higher-numbered index entries, but if you want to, git mergetool
gives you a convenient way to access them, that does not require fumbling around with git show :1:file.ext
, git show :2:file.ext
, and git show :3:file.ext
and a lot of temporary files: git mergetool
does that for you.
You don't have to use the work-tree copies of the files, with their partial merges.
You do have to run git add
or git rm
, but git mergetool
can do that for you too. To mark the conflict resolved, you will either remove the index copies entirely—meaning that the final commit won't have the file at all—or write, to index slot zero, the correct merge result.
In your particular case, you have __pycache/*.pyc
files listed (four of them) and two other files, app.db
and run.py
.
The __pycache__
files should almost never be in a Git repository. Your merge conflict for one of them shows that one side of the merge—the --ours
side, i.e., merge base vs HEAD
—had modified the file, while the other side of the merge had removed the file, in the two git diff
s that git merge
ran.
The correct resolution here would be to take their change, i.e., to remove the file entirely. For git mergetool
, then, the answer would be d
: use the deletion, rather than keeping your modified file.
For app.db
, the correct result is probably not your file, but might not be their file either. The correct result might be some combination of both files. If the database is binary, Git's simple newline-based text substitution rules, for combining two git diff
s and applying the combined changes to the merge base copy, simply doesn't work at all. It's up to you how to produce the correct final app.db
copy, but let's assume there is a magic command that can read both app.db
input files and produce the right result. You might run:
git show :2:app.db > app.db.mine
git show :3:app.db > app.db.theirs
magic-combiner -o app.db app.db.mine app.db.theirs
which combines them and writes the correct combined data to app.db
. Now that your work-tree copy is what you want to commit, you would just run:
git add app.db
This erases the three numbered slots (:1:app.db
, :2:app.db
, and :3:app.db
are all gone) and copies (and compresses and freezes and de-duplicates) the current app.db
into index-slot-zero.
For run.py
, perhaps you should look at their file and your file, and perhaps the merge base version as well, in an editor or merge tool or whatever you will use to figure out what the correct merge result is. Or perhaps the work-tree copy, with Git's attempt at merging, is sufficient for you to figure out what should be in that file. The git mergetool
command is likely to offer you a way to run some merge tool over all three inputs. I prefer to just edit run.py
in the editor and figure it out (using the three sections from my diff3
setting for merge.conflictStyle
) in most cases.
If you have git mergetool
run a tool, then:
git mergetool
knows a lot about this tool and can trust it to exit with a status code that says "all merged, use the result" or "not merged, don't use" and git mergetool
will run git add
for you or not, correctly; orgit mergetool
doesn't know enough about the tool, but will run it and then ask you if it should use the result.If git mergetool
uses the result, it will do its own git add run.py
. If not, you still have the three copies in the index; you can open run.py
in your favorite editor, look it over, and decide whether it's all correct or needs more changes. You can run tests, and so on.
Even if git mergetool
does add the file, you can still look it over and run tests. Resolving the file just means getting the index set up so that Git thinks the merge is done.
If Git thinks it did the merge on its own, Git will make a new merge commit:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
This merge commit has a hash ID, like any commit. It has a snapshot, like any commit. It has metadata, like any commit—with one difference: it lists commit J
as its first parent, so that M
points back to J
, but then it also lists commit L
as its second parent, so that M
also points back to L
. Now commits H-I-J-M
are on branch1
(plus earlier commits) but so are H-K-L-M
(plus earlier commits). So now all the commits that were only on branch2
before, are on both branches. New commit M
is only on branch1
, and—as usual—is the new tip of the branch: Git wrote M
's hash ID into the name branch1
.
If Git doesn't make the merge commit on its own due to merge conflicts, you:
git merge --continue
or git commit
,9and Git will now make merge commit M
as before, with the two parents. Or, you can run:
git merge --abort
to erase the index (well, reset it to match J
, really) and put your work-tree back to matching commit J
, and you'll be back in the situation you were in before you started the merge. (Any work you did to resolve the merge is gone, so be a bit careful here!)
9All git merge --continue
really does is make sure that you're in the middle of a merge, then run git commit
. So it's a bit of safety, in that it won't do anything if you think you're in a conflicted merge, but somehow you aborted it earlier, or finished it. Usually in that situation git commit
will tell you there's nothing to commit, too, so this is rarely important.