I got sick of a python god module and broke it up into two smaller modules, there was limited refactoring beyond pulling functions/objects out and putting them into files that made more logical sense. Unfortunately this is enough to confuse git's usual merge logic as it thinks the first file was deleted and two completely new files were added. Now whenever I have to merge main I am not getting useful merges, instead I'm getting three files and told to figure out how they differ.
Is there a way I can get a more useful merge that will show me what is changed between the original file and the two smaller files?
Is there a way I can get a more useful merge that will show me what is changed between the original file and the two smaller files?
Yes there is. As example content for this I used the code from this answer by Onat Korucu which is one Java file (LoginManager.java
) then split into multiple smaller files (among others DecryptHandler.java
).
The initial one file content was checked into branch one_file
and on top of that a new commit where the file was split up into multiple files was checked in to branch multiple_files
. After that the one_file
branch was updated with replacing the TODO body for one of the functions with some dummy code.
The challenge then is to bring/merge this update from the one_file
branch to the multiple_files
branch. Just attempting to merge one_file
into multiple_files
fails like you describe.
The key to get git to do this in to do this in multiple steps, and using separate branches for each step. These steps needs to be repeated separately for each file the original is split into, e.g. two times in your case.
The first step is to just rename the original file to one of the targets for file splitting, e.g. DecryptHandler.java
in my example. I created a branch
intermediate/DecryptHandler/filename
for this (i.e. on this branch DecryptHandler.java
is now identical to the original LoginManager.java
).
The second step is to update the renamed file with the content from the multiple_files
branch so that DecryptHandler.java
is now identical on both branches. I created a branch intermediate/DecryptHandler/content
for this.
The history now looks like this in gitk
:
The third step is now to merge the update from one_file
into the filename branch. Since this branch is just a rename git does not have any problems and does the merge without any problems.
The forth step is now to merge the updated intermediate/DecryptHandler/filename
branch into the intermediate/DecryptHandler/content
branch (NB, not the other way around). This time git will give up due to a conflict however this is just a normal merge conflict where git have recorded all three contributing versions for the same filename, and using KDiff3 was a breeze where KDiff3 automatically resolved everything.
At this point I could just merge intermediate/DecryptHandler/content
into multiple_files
and call it a day (since I only had one change affecting one file. When you have two files you possibly need to repeat for the other file).
The history would look like this in gitk
:
However this will give a history with problematic commits that breaks git bisect
, git test
, etc so it will be better to "merge" the result without actually recording it as a merge. This can be done with git diff ... | git apply -
from a branch were an actual merge is done.
So step five is to merge into a branch other than multiple_files
but with the same commit.
git branch intermediate/DecryptHandler/merge multiple_files
git switch intermediate/DecryptHandler/merge
git merge intermediate/DecryptHandler/content
# Conflict because same file added independently on two branches.
# Simple to resolve with KDiff3 and it is clear that we want the B version.
git resolve-conflict-using-kdiff3
git commit
Step six is to then copy the merge commit without merging:
git switch multiple_files
git diff multiple_files intermediate/DecryptHandler/merge | git apply -
git add DecryptHandler.java
git commit -m "Stealth merge of one_file branch" -m "git diff multiple_files intermediate/DecryptHandler/merge | git apply -"
which now gives a multiple_files
branch which has "merged" content from one_file
though some intermediate steps.
The history now looks like this in gitk
:
Now, the history looks a bit noisy there and most likely you want to exclude those intermediate branched from being shown now that the update is done1. This is simple to do by giving the --exclude=...
option. This is too cumbersome to specify manually every time you start gitk
so assuming you have a wrapper script named gitkall
for gitk --all
update it to the following:
#!/bin/sh
exec gitk \
"--exclude=refs/notes/*" \
"--exclude=refs/heads/hide/*" \
"--exclude=refs/heads/intermediate/*" \
--all \
"$@" &
Ignoring notes is for the saved test results from git test
, and I want to hide all branches with names starting with "hide/". The third exclude is to ignore the intermediate branched in this answer. (Order is important here, the --exclude options must come before --all).
Starting gitk with this will give a much simpler view:
1You could also delete them at this point, however then you need to recreate them next time you want to bring in changes.