Search code examples
gitbranchgit-diff

How to get the difference between the working sets (not commits) of two git branches


I have the following two branches that differ in files

master

 |_ document.txt
 |_ document_two.txt
 |_ document_three.txt

development

 |_ document.txt
 |_ document_two.txt
 |_ document_three.txt
 |_ virus.exe // want to get rid of that

And these are the git log results of both (from top to bottom)

master

commit: fdsdfsf1342re5252423425242234 (master, development)
add document one

commit: 563523523g233g5232sdfawe22434 (master, development)
add document two

commit: 56652u747241523g52352fsdfawew (master, development)
add document three

development

commit: fdsdfsf1342re5252423425242234 (master, development)
add document one

commit: 563523523g233g5232sdfawe22434 (master, development)
add document two

commit: 1213421g233g5232s41dfawe22434 (development)
ADD VIRUS.EXE ! XXX

commit: 5423345652u7433g52352fsdf1223 (development)
change document three completly

commit: 56652u747241523g52352fsdfawew (master, development)
add document three

Goal

I want to clean up the development branch so it has all the sourcefile states of master and has the virus.exe file removed.

I already found ways to have the master state overwrite the development file states.

Read here: Make the current Git branch a master branch

But I also want to get rid of files that do not exist within the working set of master. In this example this would be the file virus.exe. Just creating a new development branch of the master is not an option because it is important to leave the development branch as an orphan. When overwriting/branching the development branch by/off the master branch it will lose the orphan state since it will adapt the whole history of master

Does anybody know a way to:

  • get the differences between two branches in files? not their commit diffs
  • removing all found differences in files by a final commit (to keep history intact) ?

Edit: I found the command to give me the difference in files:

$ git diff-tree -r --name-status --diff-filter=A master..development
A       virus.exe

Is there an elegant way to directly use the output to remove from the development-branch? I would kinda use it in:

git checkout development && git rm virus.exe && git commit -m "clean development" && git push origin

Solution

  • Branches don't have working sets. (Well, maybe they do, since you have not defined working set, but Git does not define working set either. But since I don't know what you mean, I'm using working set as an alias for work-tree, which is well-defined, and branches don't have work-trees.)

    What branch names do is select one particular commit. That commit then has some parent commit (or more than one if it is a merge); the parent has its own parent, and so on, forming a chain of commits. As a result we can say that a branch (or branch name) contains some set of commits: these are the commits that are reachable from the name. For a good introduction to all of this, see Think Like (a) Git.

    Each commit has a stored tree or snapshot. As you have found, git diff-tree is the so-called plumbing command (script-able work-horse) for comparing two trees as found in two commits:

    git diff-tree -r commit1 commit2
    

    compares the entire trees, recursively (-r), comparing all the files contained in the two snapshots. You can spell this commit1..commit2: it means exactly the same thing, use commit1 as the left side and commit2 as the right side in the comparison. The output of this comparison is essentially a sequence of instructions, e.g., add some lines to this file, remove some lines from that one, and you'll cause the tree attached to commit1 to match the tree attached to commit2.

    You can add options like --name-only or --name-status, and --find-renames with an optional similarity index value—a percentage—to have Git compute places where renaming, then changing, a file produces a shorter instruction sequence than the simpler remove file A, and create new file B from scratch with these contents. For instance, perhaps the shorter sequence is rename A to B, then remove line 17, which is clearly much shorter than remove file A, then create file B with these 10,000 lines: [very long list of lines].

    The front-end, user-oriented git diff command in effect runs git diff-tree or git diff-index or whatever, but with the user's configuration (diff.renames and diff.renameLimit for instance) in mind, and with output sent through a pager, colorized, and so on. Git calls these commands porcelain as they are supposed to be user-friendly (vs the plumbing, hidden behind the walls, out of sight).

    When you make a new commit, you have Git store a new snapshot. Git builds the new snapshot from whatever Git has in its index at the time you run git commit. The new commit's parent is the old tip of the current branch; the new commit becomes the tip of the current branch. This is how branches grow.

    The various commands you use, such as git rm or git add, operate on the index. You use git checkout branch to extract the tip commit of a branch into this index. The files stored in the index are hard to see—you can get a complete listing with git ls-files --stage, but that's rarely very useful. These files, in the index and (eventually) frozen into a commit, are in a special Git-only form anyway. So to work with those files, when you have Git extract the tip commit, you also have Git extract all the files into a work-tree.

    The work-tree has those files in their ordinary format, where you, and your computer, can work with them in ordinary ways. But everything you do in this work-tree is really aimed at fussing with the index, because it's the index that contains what goes in the next commit. Running git commit packages up (freezes) the index contents into a snapshot, and adds that as the new tip of the branch.

    Hence, if you compare two commits (with git diff or git diff-tree), then make some changes to your index and make a new commit, what you are doing is changing your index and using it to make a new commit. The comparison of the two commits is up to you. Note that you can also compare any one commit to your index, using git diff --cached (porcelain) or git diff-index --cached (plumbing). And, you can compare index vs work-tree, or a commit vs work-tree, also using these commands.