Search code examples
gitunixgit-mergegit-merge-conflict

How do I solve "error: Merging is not possible because you have unmerged files."?


When I enter git merge "my-most-up-to-date-branch" I get the following error:

enter image description here

Problem is that I don't know where to start, at all. There are differences in .py files which I can edit by hand. But, also ones that can't fixed by hand like .db , .pyc files.

This is what I get when I enter git mergetool:

enter image description here

Also, I am not sure whether I understand what happens if I enter "m" or "d" in the second picture. Project's most up-to-date version is in the branch "reset-password". How do I solve this issue?

ps: I am sure there is a way to handle this without using merge and making "reset-password" my new "master" branch. However, I really do want to be able to handle this problem by using merge so that I can be able to handle similar problems in my future professional life.


Solution

  • When you run git merge name, there are multiple possible outcomes:

    1. Git finds nothing suitable to merge and complains and never even starts the merge. That's not the case here.

    2. Git starts the operation, and is able to complete it on its own, because one of two things is true:

      • the operation finds that a true merge is not required: that a fast forward will serve instead, and you allow the fast forward and Git does the fast forward instead of merging; or
      • the operation finds that a true merge is required, or that fast forward was disallowed, but is able to do a true merge on its own, so it does that and makes a merge commit.

      In these cases, the merge is finished, and you can go on and do more things in Git. That's also not the case here.

    3. Git starts the operation, but is unable to complete it. You're left with a mess.

    Case 3 had already happened at some point, before you ran git merge reset-password. Once case 3 has happened, you must clean up the mess before you can proceed. Running git merge again gets you the output you showed in your first image.

    (Note: you can get stuck in this same situation with git cherry-pick and git revert or any command that invokes these. Since git rebase is performed by repeated cherry-pick operations, those too can leave you with merge conflicts. I am just guessing here that you had an earlier merge that never completed, based on the >M< in your shell prompt. It seems that most setups use >R< or >R for an incomplete rebase.)

    Problem is that I don't know where to start, at all ...

    Well, you sort of do, because you've tried git mergetool. But this is jumping right into the deep end of Git. Unfortunately, pretty much all approaches involve jumping into the deep end, here. 😅 Git sort of forces you to learn about the mess it leaves behind, and it's not simple.

    Things to know beforehand

    You may already know some of this, but it's worth at least a quick scan just in case. First, Git is really all about commits. Commits are numbered, but Git (and you) will generally find these numbers by branch names, because of these facts about commits:

    • Commits are numbered, but the numbers are big and ugly. They look like, for instance, e1cfff676549cdcd702cbac105468723ef2722f4. These numbers might seem random, but in fact, they are cryptographic checksums of the contents of some internal Git object. Each Git commit is a unique internal object, and thus gets a unique checksum.1

    • Commits contain two things: a full snapshot of every source file that Git knew about at the time you (or whoever) made the commit, and some metadata. The metadata include stuff like the name and email address of whoever made the commit. Crucially for Git, the metadata also include the raw hash ID of the previous commit, or for a merge commit, two or more previous commits. Git calls the previous commit the parent (and by implication, the commit itself is therefore a child of that parent).

    • The fact that the hash IDs are checksums means nothing about any commit can ever be changed. All commits are completely read-only. The files inside each commit are also read-only; to save space, they're compressed and stored in a Git-only format, with de-duplication.

    There are some important consequences of these three points that we'll go over extremely fast here:

    • Commits form chains. Since it's the child commit that holds the parent's hash ID—it has to be; the child's hash ID isn't predictable when we make the parent—these chains point backwards.

    • A branch name simply holds the hash ID of the last commit in some chain. However, being the last commit in some chain doesn't mean there cannot be more commits after this point: another branch name can point to a later commit.

    • Many commits are on multiple branches. The very first commit in a repository, which has no parent because there's no commit before it, is pretty commonly on every branch. (The only way for it not to be on every branch is to have more than one of these "first" or root commits. We won't look at how this can come about, here.)

    • Because the files inside a commit are read-only, the files that you work on (or with) are not in a commit. In an important sense, they are not in the repository itself at all.

    The sections below aren't about merging at all, yet. We'll get to that in a larger heading, in a bit.


    1Pay no attention to the pigeonhole principle here, or see How does the newly found SHA-1 collision affect Git?


    Extracting a commit: your work-tree

    Let's expand a bit on that last bullet point. To get any actual work done, you need to get files out of some commit. Git will do this by extracting the frozen, compressed, and de-duplicated files (which sometimes aren't normal OS files at all, and which all have hash-ID names internally) into regular everyday files, putting those into a work area. This work area is not inside the repository.2

    Git calls this work area your working tree or work-tree. Since this area is yours, you can create other files and directories/folders here, if you like. The files that Git knows about are, at least initially, those that Git just extracted from some existing commit. If you use the OS to create additional files, Git doesn't know about them, although Git will normally take care not to clobber them by accident.3

    Pretty much all version control systems work like this: there are committed files, which are saved for all time,4 and some more-temporary ones that you can actually work on/with. This part, most people don't find confusing at all. Most other version control systems stop here, but Git being Git, it doesn't.


    2The repository itself is typically stored in a hidden .git directory at the top level of the work area. That is, the repository is in the work-tree, rather than the other way around! This is not always a sensible arrangement, and submodule repositories are normally moved out of the way, in modern Git, lest this part of your work-tree get removed.

    3Listing a file in a .gitignore sometimes gives Git permission to clobber it, and some Git commands, such as git clean, are supposed to destroy such files anyway. So this is not a total guarantee of safety. But in general, you can create files in your work-tree, and not have Git ruin them. You'll see complaints from Git now and then about some work-tree file being in the way of a git checkout or git merge operation: Git is just telling you Hey, I found this file of yours, and if I overwrite it now, from a committed file, I'll be clobbering your data, so maybe you should move it out of the way first.

    4Or saved for as long as you don't tell the system to forget that commit, or whatever. The details of this vary, quite a lot, from one version control system to another.


    Making new commits: Git's index

    In other version control systems (VCSes), you check out some commit, and now you have a bunch of useful files. You make changes to those files, and when you are ready, you tell the VCS: commit these. It goes and finds what you did, and commits that. Some of these VCSes can be excruciatingly slow here. Git tends to be blazing-fast. It gets this speed at a price. That price has useful (to you) side benefits, but it's definitely confusing, and it's time to learn all about it.

    Instead of just having two copies of each file, Git stores three. One of the three is the frozen (and de-duplicated) file in the current commit. You picked out some commit to work on, so that commit—with its big ugly hash ID—is the current commit, and that commit has a snapshot of all files.

    At the opposite end, as it were, Git has copied all of those files out of the commit, into your work-tree. These are ordinary everyday files that you can do anything with.

    Between these two copies, though, Git keeps a third "copy". The word "copy" is in quotes here because this third one is in the frozen form, and is pre-de-duplicated. Initially, all of these match the copies in the commit. This extra copy lives in something that Git calls the index, or the staging area, or sometimes—rarely these days—the cache. All three of these names are for the same thing. It has three names perhaps because index doesn't mean anything, and cache is too specific: the name staging area reflects its role.

    When you go to make a new commit, Git uses the ready-to-go files that are in Git's index. Since they're in the right format, Git can make a new commit very quickly. But this means that if you change your copy in your work-tree, you have to tell Git to replace the index copy. The copy in the index is in the frozen format but isn't in a commit and therefore is not actually frozen.

    The git add command is how you do all this. What git add file really means is make the index copy of file match the work-tree copy. Git will replace the old index copy with a new one, compressing and de-duplicating the file at git add time, to make it ready to be committed. This means that instead of git commit being slow, it's git add that's slow—but you only have to do it on files that you changed, so it's not really that slow.

    All of this, in turn, means that what's in the index—or staging area—is, in effect, your proposed next commit. Git has filled it in from the current commit, when Git extracted that commit. Git copied the commit to both Git's index and your work-tree. Now that you've changed stuff, or maybe added or even removed some files, you must update Git's index to match. You do this with git add, or git rm if you want to just remove stuff. This updates Git's index, and hence your proposed next commit.

    Making new commits: updating a branch name

    Before we move on to how merge works, let's take a moment to observe the process of making regular everyday non-merge commits—commits with just one parent, in other words. We start with a simple linear chain of commits, ending with some particular last commit with a hash ID:

    ... <-F <-G <-H   <-- somebranch (HEAD)
    

    Here H stands in for the actual hash ID of the last commit in the chain. Commit H holds a snapshot and metadata. Git can find the commit by its hash ID, and the hash ID is in the name somebranch. In the metadata for commit H, Git can find the hash ID of earlier—parent—commit G, so using somebranch to find H lets Git find G. Commit G of course has a snapshot and metadata, and the metadata include the hash ID of its parent F. This has a snapshot and a hash ID again. So given just the branch name, Git can find all the commits.

    Let's make a second branch name that points to the same commit:

    ...--F--G--H   <-- somebranch (HEAD), anotherbranch
    

    We're still using commit H. The (HEAD) here tells us that we're using the name somebranch to find commit H. If you git checkout anotherbranch, we'll start using the name anotherbranch instead, but still find commit H:

    ...--F--G--H   <-- somebranch, anotherbranch (HEAD)
    

    If you now modify some files and git add them to put the updated files into Git's index, you can now run git commit to make a new commit. Git will:

    • gather metadata, such as your name and email address and the current date and time (and your commit message and sometimes more stuff);
    • use the hash ID of the current commit for the new commit's parent;
    • use whatever is in Git's index / staging-area for the new commit's snapshot;
    • write out the new commit, which assigns it its new hash ID, but we'll just call that I.

    There's one more step, but let's draw commit I now:

    ...--F--G--H
                \
                 I
    

    Now let's add the branch names, after we note that the last step for git commit is that Git writes the new hash ID into the current branch name—the one with the attached HEAD:

    ...--F--G--H   <-- somebranch
                \
                 I   <-- anotherbranch (HEAD)
    

    Now commits up through H are on both branches, and new commit I is only on anotherbranch.

    Merging

    We are now ready to tackle Git's merge operation. Let's consider, first, these facts:

    • Each commit holds a snapshot—not changes, just a snapshot.
    • We can turn a snapshot into changes if we pick some other—usually earlier—snapshot, and compare the two. It's just a simple game of spot the difference, with the computer doing all the spotting.
    • The goal of a merge is to combine changes.

    We start with a situation like this:

              I--J   <-- branch1 (HEAD)
             /
    ...--G--H
             \
              K--L   <-- branch2
    

    That is, the name branch1 selects some commit—which we'll call J—and the name branch2 selects some other commit that we'll call L. The one we are using right now is J: that's what's in Git's index and in our work-tree.

    When we run git merge branch2, Git uses HEAD to locate our commit J, and uses the name branch2 that we gave as an argument to locate their commit L. But now Git needs to figure out what we changed and what they changed. That means Git has to find some earlier commit.

    The right earlier commit is not always obvious, but what Git needs is a commit that is on both branches. Commit H is on both branches; so is commit G, and anything earlier. It kind of stands to reason, though, that the best commit is probably the one "closest to the ends": that is, commit H is "better" than commit G, because comparing the snapshot in H against either later commit will probably find fewer changes than comparing the snapshot in G, or anything earlier.

    We call this "right commit" the merge base, and in any case, Git finds the merge base on its own here. In this easy case, it's easy to see that Git will pick commit H. In more complex graphs, using git merge-base --all may be the only sane way to see what Git is picking.5

    To find what we changed, Git now runs, in effect:

    git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
    

    A very similar command finds what they changed:

    git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed
    

    Git also, at this point, actually reads all three commits into the index.


    5Git uses a lowest common ancestor algorithm here. When applied to a Directed Acyclic Graph, there may be more than one LCA. The --all to git merge-base --all tells this command to print out all LCAs. Different merge strategies may use just one merge base, or all of them; we won't go into the details here.


    Merging really takes place in the index

    Earlier we saw that the index had a copy of each file. This is the normal state for the index, but during a merge, the index actually expands. Instead of one copy, it holds three:

    • Git reads the merge base commit into the index at "slot 1".
    • Git reads our commit into the index at "slot 2".
    • Git reads their commit into the index at "slot 3".

    That is, if the merge base, our commit, and their commit all have a README.md file, the index now has three README.md files in it. We can name these using a digit and some colons, with some Git commands,6 e.g.:

    git show :1:README.md   # view the merge base copy
    git show :2:README.md   # view the `--ours` copy
    

    This repeats for every file in the three commits. Some of the commits might not have all the file names, and the git diff --find-renames above might find that, from commit H to L, they renamed some file, for instance; in this case the index entries are a little trickier. Or perhaps we or they deleted a file, or added a whole new file, in which case there's no slot-1 entry but there is a slot-2 or slot-3 entry. You have these cases, so we can't ignore them. But they're a little more complicated, so for now, we will ignore them. The rest is pretty straightforward:

    • If all three index entries match, nobody touched the file at all. Any of the three copies will serve as the merged file.
    • If two of three copies match, then either we changed the file (base and theirs match) and Git should use ours, or they changed the file (base and ours match) and Git should use theirs, or we and they made the same change to the file (theirs and ours match) and Git should use either of these versions. So Git will use ours or theirs, whichever one doesn't match the base.
    • If all three copies are different, Git has to do some real work.

    If Git was able, from the above, to figure out which version to use, Git just moves that version from these nonzero numbered slots to slot number zero, and erases the higher-numbered slots. A slot-zero entry is the normal "this is the file, ready to be committed" copy. So that file is now resolved. Git puts the chosen copy into your work-tree as well.

    If not, Git goes on to try a low-level merge of the file.7


    6Most Git commands that can take a hash ID can take names that Git resolves into a hash ID. This resolution is done through the rules outlined in the gitrevisions documentation. So git rev-parse :1:README.md prints out the internal blob hash ID for that file. When using git show or git cat-file -p, you can give it either the hash ID, or the name; they'll run the name through an internal rev-parse as needed.

    7You can specify a merge driver instead of letting Git use its built in one. This also gets somewhat complicated.


    Low-level merging is normally done diff-hunk-by-diff-hunk

    Suppose that we have three different versions of run.py in the index, and the diff from base to ours says to make a change to line 42, while the diff from base to theirs says to make a different change to line 54. Git will simply take both changes and apply them to the merge base copy of the file.

    If we and they changed the same line(s), Git will compare what we both used as the new replacement(s) for them. If our replacements match, Git will take one copy of this change.

    If we and they changed the same lines but to different text, Git will declare a merge conflict in this file, and will arrange for the merge to stop in the middle. The extended (-X) options can tell Git not to stop after all (by telling it to favor ours or theirs), but we'll skip over these.

    If there are no merge conflicts after combining our changes and their changes, Git will, as usual, put the result into index slot zero and your work-tree. This file is also resolved.

    If Git isn't able to resolve the conflict, the low level merge code will write its best-effort at merging the three files to your work-tree. (What happens to Git's index, well, we'll leave that for the next section.) The work-tree file will use the combined changes wherever they didn't conflict, and where they did, will contain lines from both "sides" of the merge. If you set merge.conflictStyle to diff3, the conflicted region will include the corresponding lines from the merge-base version of the file. I like to set this option always; I find the resulting conflicts easier to read.

    High level conflicts, also known as tree conflicts

    In the section above, I talked about how Git handles conflicts within the three versions of some file, where there is a merge base copy, the --ours copy, and the --theirs copy, and all three differ. But let's see what happens with these cases:

    • Suppose they delete a file and we don't do anything to it. What should Git do with this? Git's answer is take the deletion: Git keeps the file deleted in the merge result, by emptying out all index slots, including slot zero, and making sure that the file isn't there in your work-tree.

    • Suppose we delete a file and they don't do anything to it. Git handles this the same way.

    • Suppose we, or they, delete a file, and they or we—the other side—modify the file. Git's answer is to declare a merge conflict and just leave two of the three copies in the index. Git calls this a modify/delete conflict.

    • Suppose we rename a file (without changing its content), and they don't change it, or do change it but don't rename it. Git's answer is to combine both changes: take their changes if any, and use our new file name. The same applies of they rename it and we don't. If we both modify the file, and the low level code can combine the content changes, Git resolves the file by taking both the rename and the combined content change.

    • If we both rename the file, but to different new names, Git calls this a rename/rename conflict.

    • If we both create all-new files, with different content but the same name, Git calls this an add/add conflict.

    These conflicts that involve file names or entire file creation/deletion are all high level or tree conflicts, because they don't involve low-level content conflicts. We can even get both high and low level conflicts, e.g., with a rename/rename conflict plus a low level conflict; but the main point here is that if we do get one of these high level conflicts, the extended (-X ours and -X theirs) options have no effect: those options are only handled by the low-level merge code.8

    In any case, if Git does stop with a merge conflict, it leaves the nonzero slot number entries in its index. This leaves the two or three input files available to commands like git mergetool, and leaves enough traces for git mergetool to diagnose high level conflicts such as modify/delete conflicts.


    8There may, in the future, be some fancier high level conflict handlers that do allow some -X options. But today there aren't.


    Your job: clean up the mess

    We now know what kind of mess Git leaves behind:

    • Some work-tree files may have merge conflicts in them.
    • If this is the case, the index holds some nonzero slot number entries for all of them.
    • The index may hold nonzero slot number entries for files that don't have low level merge conflicts either. All merge conflicts show up in the index, through these nonzero slot numbers; the low-level ones also show up as partially merged work-tree files with conflict markers.

    Your job is to finish the merge. You may do this any way you like.

    • You don't have to use the higher-numbered index entries, but if you want to, git mergetool gives you a convenient way to access them, that does not require fumbling around with git show :1:file.ext, git show :2:file.ext, and git show :3:file.ext and a lot of temporary files: git mergetool does that for you.

    • You don't have to use the work-tree copies of the files, with their partial merges.

    • You do have to run git add or git rm, but git mergetool can do that for you too. To mark the conflict resolved, you will either remove the index copies entirely—meaning that the final commit won't have the file at all—or write, to index slot zero, the correct merge result.

    Your particular case

    In your particular case, you have __pycache/*.pyc files listed (four of them) and two other files, app.db and run.py.

    The __pycache__ files should almost never be in a Git repository. Your merge conflict for one of them shows that one side of the merge—the --ours side, i.e., merge base vs HEAD—had modified the file, while the other side of the merge had removed the file, in the two git diffs that git merge ran.

    The correct resolution here would be to take their change, i.e., to remove the file entirely. For git mergetool, then, the answer would be d: use the deletion, rather than keeping your modified file.

    For app.db, the correct result is probably not your file, but might not be their file either. The correct result might be some combination of both files. If the database is binary, Git's simple newline-based text substitution rules, for combining two git diffs and applying the combined changes to the merge base copy, simply doesn't work at all. It's up to you how to produce the correct final app.db copy, but let's assume there is a magic command that can read both app.db input files and produce the right result. You might run:

    git show :2:app.db > app.db.mine
    git show :3:app.db > app.db.theirs
    magic-combiner -o app.db app.db.mine app.db.theirs
    

    which combines them and writes the correct combined data to app.db. Now that your work-tree copy is what you want to commit, you would just run:

    git add app.db
    

    This erases the three numbered slots (:1:app.db, :2:app.db, and :3:app.db are all gone) and copies (and compresses and freezes and de-duplicates) the current app.db into index-slot-zero.

    For run.py, perhaps you should look at their file and your file, and perhaps the merge base version as well, in an editor or merge tool or whatever you will use to figure out what the correct merge result is. Or perhaps the work-tree copy, with Git's attempt at merging, is sufficient for you to figure out what should be in that file. The git mergetool command is likely to offer you a way to run some merge tool over all three inputs. I prefer to just edit run.py in the editor and figure it out (using the three sections from my diff3 setting for merge.conflictStyle) in most cases.

    If you have git mergetool run a tool, then:

    • either git mergetool knows a lot about this tool and can trust it to exit with a status code that says "all merged, use the result" or "not merged, don't use" and git mergetool will run git add for you or not, correctly; or
    • git mergetool doesn't know enough about the tool, but will run it and then ask you if it should use the result.

    If git mergetool uses the result, it will do its own git add run.py. If not, you still have the three copies in the index; you can open run.py in your favorite editor, look it over, and decide whether it's all correct or needs more changes. You can run tests, and so on.

    Even if git mergetool does add the file, you can still look it over and run tests. Resolving the file just means getting the index set up so that Git thinks the merge is done.

    Committing the final merge

    If Git thinks it did the merge on its own, Git will make a new merge commit:

              I--J
             /    \
    ...--G--H      M   <-- branch1 (HEAD)
             \    /
              K--L   <-- branch2
    

    This merge commit has a hash ID, like any commit. It has a snapshot, like any commit. It has metadata, like any commit—with one difference: it lists commit J as its first parent, so that M points back to J, but then it also lists commit L as its second parent, so that M also points back to L. Now commits H-I-J-M are on branch1 (plus earlier commits) but so are H-K-L-M (plus earlier commits). So now all the commits that were only on branch2 before, are on both branches. New commit M is only on branch1, and—as usual—is the new tip of the branch: Git wrote M's hash ID into the name branch1.

    If Git doesn't make the merge commit on its own due to merge conflicts, you:

    • clean up the mess, then
    • run git merge --continue or git commit,9

    and Git will now make merge commit M as before, with the two parents. Or, you can run:

    git merge --abort
    

    to erase the index (well, reset it to match J, really) and put your work-tree back to matching commit J, and you'll be back in the situation you were in before you started the merge. (Any work you did to resolve the merge is gone, so be a bit careful here!)


    9All git merge --continue really does is make sure that you're in the middle of a merge, then run git commit. So it's a bit of safety, in that it won't do anything if you think you're in a conflicted merge, but somehow you aborted it earlier, or finished it. Usually in that situation git commit will tell you there's nothing to commit, too, so this is rarely important.