Search code examples
gitgit-mergegit-merge-conflict

How to better merge changes in git when a module was split into two smaller files?


I got sick of a python god module and broke it up into two smaller modules, there was limited refactoring beyond pulling functions/objects out and putting them into files that made more logical sense. Unfortunately this is enough to confuse git's usual merge logic as it thinks the first file was deleted and two completely new files were added. Now whenever I have to merge main I am not getting useful merges, instead I'm getting three files and told to figure out how they differ.

Is there a way I can get a more useful merge that will show me what is changed between the original file and the two smaller files?


Solution

  • Is there a way I can get a more useful merge that will show me what is changed between the original file and the two smaller files?

    Yes there is. As example content for this I used the code from this answer by Onat Korucu which is one Java file (LoginManager.java) then split into multiple smaller files (among others DecryptHandler.java).

    The initial one file content was checked into branch one_file and on top of that a new commit where the file was split up into multiple files was checked in to branch multiple_files. After that the one_file branch was updated with replacing the TODO body for one of the functions with some dummy code.

    The challenge then is to bring/merge this update from the one_file branch to the multiple_files branch. Just attempting to merge one_file into multiple_files fails like you describe.

    The key to get git to do this in to do this in multiple steps, and using separate branches for each step. These steps needs to be repeated separately for each file the original is split into, e.g. two times in your case.

    The first step is to just rename the original file to one of the targets for file splitting, e.g. DecryptHandler.java in my example. I created a branch intermediate/DecryptHandler/filename for this (i.e. on this branch DecryptHandler.java is now identical to the original LoginManager.java).

    The second step is to update the renamed file with the content from the multiple_files branch so that DecryptHandler.java is now identical on both branches. I created a branch intermediate/DecryptHandler/content for this.

    The history now looks like this in gitk:

    Gitk screenshot 1

    The third step is now to merge the update from one_file into the filename branch. Since this branch is just a rename git does not have any problems and does the merge without any problems.

    The forth step is now to merge the updated intermediate/DecryptHandler/filename branch into the intermediate/DecryptHandler/content branch (NB, not the other way around). This time git will give up due to a conflict however this is just a normal merge conflict where git have recorded all three contributing versions for the same filename, and using KDiff3 was a breeze where KDiff3 automatically resolved everything.

    At this point I could just merge intermediate/DecryptHandler/content into multiple_files and call it a day (since I only had one change affecting one file. When you have two files you possibly need to repeat for the other file).

    The history would look like this in gitk:

    Gitk screenshot 2

    However this will give a history with problematic commits that breaks git bisect, git test, etc so it will be better to "merge" the result without actually recording it as a merge. This can be done with git diff ... | git apply - from a branch were an actual merge is done.

    So step five is to merge into a branch other than multiple_files but with the same commit.

    git branch intermediate/DecryptHandler/merge multiple_files
    git switch intermediate/DecryptHandler/merge
    git merge intermediate/DecryptHandler/content
    # Conflict because same file added independently on two branches.
    # Simple to resolve with KDiff3 and it is clear that we want the B version.
    git resolve-conflict-using-kdiff3
    git commit
    

    Step six is to then copy the merge commit without merging:

    git switch multiple_files
    git diff multiple_files intermediate/DecryptHandler/merge | git apply -
    git add DecryptHandler.java
    git commit -m "Stealth merge of one_file branch" -m "git diff multiple_files intermediate/DecryptHandler/merge | git apply -"
    

    which now gives a multiple_files branch which has "merged" content from one_file though some intermediate steps.

    The history now looks like this in gitk:

    Gitk screenshot 3


    Now, the history looks a bit noisy there and most likely you want to exclude those intermediate branched from being shown now that the update is done1. This is simple to do by giving the --exclude=... option. This is too cumbersome to specify manually every time you start gitk so assuming you have a wrapper script named gitkall for gitk --all update it to the following:

    #!/bin/sh
    
    exec gitk \
            "--exclude=refs/notes/*" \
            "--exclude=refs/heads/hide/*" \
            "--exclude=refs/heads/intermediate/*" \
            --all \
            "$@" &
    

    Ignoring notes is for the saved test results from git test, and I want to hide all branches with names starting with "hide/". The third exclude is to ignore the intermediate branched in this answer. (Order is important here, the --exclude options must come before --all).

    Starting gitk with this will give a much simpler view:

    Gitk screenshot 4


    1You could also delete them at this point, however then you need to recreate them next time you want to bring in changes.