Search code examples
gitgit-mergegit-remotesvnsync

Git merge does not deleting files from another repository


Two repositories exist: rep1, rep2. Both are results of one-way syncs from the same SVN instance. (The changes in SVN are synched to git, but the changes in git are not synched back to SVN) the two repositories are not forks of each other.

rep1/master branch contains a file file1.txt

rep2/master branch contained the file file1.txt, but the file was deleted in rep2/

rep2 is added as a remote in rep1 in order to merge. (remot2)

branch rep2/master is created

git checkout -b rep2_master_branch --track rep2/master

a "merge" branch is created from the tip of rep1/master

git checkout -b merge-master-branches master 

Execute a merge with option "theirs" and squash. allow-unrelated-histories is required as the merge comes from another server.

git merge -Xtheirs --squash rep2_master_branch  --allow-unrelated-histories

What is observed is that rep1/file1.txt is NOT deleted.

Further observations, conflicts during the merge are correctly resolved, as are modififed and added files. It seems that the only omission are files that have been deleted in rep2.

How can I work around this? or better yet, how can this be resolved?


Solution

  • As far as Git is concerned, that is the correct merge result.

    Git's git merge works by comparing two branch tips to a common merge base. That is, if you draw the commit graph, and it looks like this:

                 o--o--L   <-- branch1
                /
    ...--o--o--*
                \
                 o--o--R   <-- branch2
    

    then the result of merging branch1 (commit L) with branch2 (commit R) will be computed by comparing the contents of commit *, the common point from which both branches derive, to the contents of commit L and to the contents of commit R.

    Whatever changed from * to L happened in the left side. Whatever changed from * to R happened in the right side. Git combines these two "things that happened", applies those combined changes to the contents of *, and if all goes well, makes a new commit from the result.

    But you are using --allow-unrelated-histories. This is required only when there is no common commit: we draw the history of L and R and find that they never diverged from a common point. For instance, we might get:

    A--o--o--...--L   <-- branch1
    
    B--o--...--o--R   <-- branch2
    

    where A and B are both root commits. So what should Git use as the common starting point, to figure out what changed since then?

    One can argue for various starting-points of one's choice, but Git's answer is: Use an empty commit. That is, with --allow-unrelated-histories, Git pretends there's a common commit that has no files at all:

      A--o--o--...--L   <-- branch1
     /
    *
     \
      B--o--...--o--R   <-- branch2
    

    Comparing commit *, which is empty, to commit L, Git finds that every file on branch1 is newly created in the form that it has in L. Comparing * to R, Git finds that every file on branch2 is newly created in the form that it has in R. Wherever those files match, everything is fine: we just take that file. Wherever L has a file that R doesn't or vice versa, everything is fine: we just take that file. Wherever both L and R have a file, but their contents differ, we have a merge conflict.

    Since you used -X theirs, Git resolves each conflict by taking "their" changes ("theirs" being the commit you named, rather than the one you were on as HEAD) since the empty merge-base commit.

    If you want Git to pretend that some other commit is the merge base, you can use git replace --graft to insert some fake parent linkage. For instance, suppose you want to fake-tie the two together at arbitrarily-chosen commit C:

    A--o--o--...--L   <-- branch1
             /
    B--...--C--...--R   <-- branch2
    

    so that Git will compare C to L, and C to R. As far as git merge itself goes, it doesn't matter which commit you choose in the A--...--L chain; any commit will do (including A and L themselves), but note that Git will only "see" changes since the commit you pick (due to diffing C-vs-L, and C-vs-R). So, pick some appropriate commit in the chain and run:

    git replace --graft <hash-ID-of-chosen-commit> <hash-ID-of-C>
    

    and Git will now use the replacement (graft) commit instead of the chosen commit. A git merge will now use the contents of C as the common source. (You can delete the graft as soon as git merge exits, even if it exits with a merge conflict. Hence you can write a tiny script that inserts the graft, runs git merge, and deletes the graft, to merge with arbitrarily-chosen ancestor. Just insert it as the replacement for L as long as it's on the B--...--R line, or insert it as the replacement for R as long as it's on the A--...--L line.)