Search code examples
gitgit-mergegit-rebase

Does git-merge add files from the "to-be-merged" branch to head or not


So me and my team are working on a project under the agile development process. Every iteration there are several branches off of the test branch. Each branch acts as a task for the iteration. When the iteration is finished, all task branches are merged into the test branch and the test branch is merged into master.

The problem I am running into is merging test into master. In this specific case, master is quite behind and test has many files that aren't even included in master. So when I go to compare test and master, through a pull request, it acts like all the new files in test won't be added to master.

Is this how git-merge works? It only updates the changed files in master and doesn't care about added files?

So is my only option to do a git-rebase? If so, how does that work?

EDIT: I may have found the source of my issue. What if some task branches were branched off of master at the beginning of the iteration, then merged into test at the end. Would that cause some issues when merging test into master? Whenever I'm viewing my test branch on Github, it says, "This branch is 7 commits ahead, 5 commits behind master".


Solution

  • In light of your edit, let's break this into two parts.

    First, your direct questions about how different git operations should behave:

    Is this how git-merge works? It only updates the changed files in master and doesn't care about added files?

    No; any change on the remote branch - including addition of a new file - should be incorporated by the merge. If this isn't happening, something else is going on.

    So is my only option to do a git-rebase?

    With a couple exceptions based on edge cases, you should expect the resulting content (TREE) to be the same for a rebase as for a merge. Whatever it is that's keeping your merge from performing as expected would likely also cause a rebase to behave in an unexpected way.

    The choice to rebase or to merge is (in theory) only about how history is preserved.

    Second: Ok... so what's really going on here?

    It's hard to say exactly what's happening, because it's hard to have enough information about the current state of the repo. But I can draw up a few scenarios and talk a bit about how merge works, and maybe that will give you enough to go on that you can trouble-shoot it...

    Say you initiate a merge like this:

    git checkout branchA
    git merge branchB
    

    Now git is going to identify three commits:

    • ours - the commit that's checked out (i.e. what branchA points to)
    • theirs - the commit whose history is being merged in (i.e. what branchB` points to)
    • base - a "merge base" between ours and theirs. Typically this can be summed up as "the most recent common ancestor of ours and theirs"

    So if you have

    A -- B -- C -- D -- O <--(branchA)
          \
           E -- F -- T <--(branchB)
    

    then O will be ours, T will be theirs, and B will be base.

    Next git will calculate a patch (diff, roughly) between B and T - which we can call "their changes" - and also a patch between B and O - which we can call "our changes".

    If commit E adds a previously-nonexistent file foo.txt, then there will be no change for that file in our changes (because it didn't exist in either B or O), but foo.txt will be a new file in their changes (because it didn't exist in B but it did exist in T).

    Now git wants to create a combined patch that contains our changes and their changes. This is where conflicts can turn up, if our changes and their changes both attempt to modify the same hunk of code. But in our "new file" scenario, there wouldn't be a conflict since our changes say nothing about foo.txt. The combined patch should include "new file foo.txt" with the file's content as it appears at T.

    So most of the time, if merge is behaving in a confusing way, you can start to understand it by knowing what is being used as the merge base.

    git merge-base branchA branchB
    

    will print out a commit ID. The question is, does the commit contain foo.txt? If it does, then you're not going to get the behavior you expect; and then the problem is reduced to "figure out why the calculated merge base contains foo.txt.

    Now in the above example, this command would return the ID for B, which does not contain foo.txt. So why might it be different in your case?

    In your edit, you describe that some branches were created from master and merged to test. This is not necessarily a problem. Say we have

    A -- B <--(master)
    

    and you say

    git checkout master
    git branch test
    

    then some work occurs on test.

    A -- B <--(master)
          \
           C -- D <--(test)
    

    and now someone creates a task branch, but they create it from master.

    A -- B <--(master)
         |\
         | E -- F <--(task1)
          \
           C -- D <--(test)
    

    Then they merge task into test

    A -- B <--(master)
         |\
         | E ----- F <--(task1)
          \         \
           C -- D -- T <--(test)
    

    Some other work goes on affecting master

    A -- B -- G -- H -- O <--(master)
         |\
         | E ----- F <--(task1)
          \         \
           C -- D -- T <--(test)
    

    Now in this case, the fact that task1 was created "from master" doesn't matter. In fact, nothing currently stored in git even reflects the fact that it was created from master; it could just as well have been created from test before commit C.

    Meaning the merge would still work fine.

    But here's a different scenario... What if the person who created task1 from master actually committed changes to master before realizing they needed to branch? So in this case we start again from

    A -- B <--(master)
          \
           C -- D <--(test)
    

    and a developer starts coding task1 on master. They commit, and you ahve

    A -- B -- E <--(master)
          \
           C -- D <--(test)
    

    Then someone says "hey, dude, you can't just work directly on master", and so we try to fix things.

    git checkout master
    git checkout -b task1
    # work continues on the task1 branch
    git commit
    

    And some other work happens on master, so now you have

    A -- B -- E -- G <--(master)
          \    \
           \    F <--(task1)
            \
             C -- D <--(test)
    

    which looks pretty good, but someone says "hey, we can't have the changes form E on master until test gets merged back; and we can't afford the problems that a history rewrite would cause. So..."

    git checkout master
    git revert HEAD^
    

    and we have

    A -- B -- E -- G -- ~E <--(master)
          \    \
           \    F <--(task1)
            \
             C -- D <--(test)
    

    Everything on each branch looks right, so work continues

    A -- B ------- E -- G -- ~E -- H -- O <--(master)
          \         \
           \         F <--(task1)
            \         \
             C -- D -- T <--(test)
    

    But now when you try to merge test into master, something terrible happens: git figures that the merge base is E, because of the common ancestors between master and test, E is the one that can reach all of the others / cannot be reached by any of the others.

    So with E as the merge base, their changes don't affect foo.txt (because it was already created in E), whereas our changes deleted foo.txt (in ~E). So the merge result has no foo.txt, and this is not a change from the branch - it's a change already in master's history - so the PR report says nothing about it.

    Do I know this is exactly what happened? No. Would I bet it was something pretty much like this? Yes. Can I spell out a procedure to fix it? Not really, because the exact steps are likely to be involved and to depend on exactly what happened.

    For now I'd start by trying to confirm if the problem is fundamentally what I've suggested.