I have the following two branches that differ in files
master
|_ document.txt
|_ document_two.txt
|_ document_three.txt
development
|_ document.txt
|_ document_two.txt
|_ document_three.txt
|_ virus.exe // want to get rid of that
And these are the git log
results of both (from top to bottom)
master
commit: fdsdfsf1342re5252423425242234 (master, development)
add document one
commit: 563523523g233g5232sdfawe22434 (master, development)
add document two
commit: 56652u747241523g52352fsdfawew (master, development)
add document three
development
commit: fdsdfsf1342re5252423425242234 (master, development)
add document one
commit: 563523523g233g5232sdfawe22434 (master, development)
add document two
commit: 1213421g233g5232s41dfawe22434 (development)
ADD VIRUS.EXE ! XXX
commit: 5423345652u7433g52352fsdf1223 (development)
change document three completly
commit: 56652u747241523g52352fsdfawew (master, development)
add document three
I want to clean up the development branch so it has all the sourcefile states of master and has the virus.exe
file removed.
I already found ways to have the master state overwrite the development file states.
Read here: Make the current Git branch a master branch
But I also want to get rid of files that do not exist within the working set of master. In this example this would be the file virus.exe
. Just creating a new development branch of the master is not an option because it is important to leave the development branch as an orphan. When overwriting/branching the development branch by/off the master branch it will lose the orphan state since it will adapt the whole history of master
Does anybody know a way to:
Edit: I found the command to give me the difference in files:
$ git diff-tree -r --name-status --diff-filter=A master..development
A virus.exe
Is there an elegant way to directly use the output to remove from the development-branch? I would kinda use it in:
git checkout development && git rm virus.exe && git commit -m "clean development" && git push origin
Branches don't have working sets. (Well, maybe they do, since you have not defined working set, but Git does not define working set either. But since I don't know what you mean, I'm using working set as an alias for work-tree, which is well-defined, and branches don't have work-trees.)
What branch names do is select one particular commit. That commit then has some parent commit (or more than one if it is a merge); the parent has its own parent, and so on, forming a chain of commits. As a result we can say that a branch (or branch name) contains some set of commits: these are the commits that are reachable from the name. For a good introduction to all of this, see Think Like (a) Git.
Each commit has a stored tree or snapshot. As you have found, git diff-tree
is the so-called plumbing command (script-able work-horse) for comparing two trees as found in two commits:
git diff-tree -r commit1 commit2
compares the entire trees, recursively (-r
), comparing all the files contained in the two snapshots. You can spell this commit1..commit2
: it means exactly the same thing, use commit1
as the left side and commit2
as the right side in the comparison. The output of this comparison is essentially a sequence of instructions, e.g., add some lines to this file, remove some lines from that one, and you'll cause the tree attached to commit1
to match the tree attached to commit2
.
You can add options like --name-only
or --name-status
, and --find-renames
with an optional similarity index value—a percentage—to have Git compute places where renaming, then changing, a file produces a shorter instruction sequence than the simpler remove file A, and create new file B from scratch with these contents. For instance, perhaps the shorter sequence is rename A to B, then remove line 17, which is clearly much shorter than remove file A, then create file B with these 10,000 lines: [very long list of lines].
The front-end, user-oriented git diff
command in effect runs git diff-tree
or git diff-index
or whatever, but with the user's configuration (diff.renames
and diff.renameLimit
for instance) in mind, and with output sent through a pager, colorized, and so on. Git calls these commands porcelain as they are supposed to be user-friendly (vs the plumbing, hidden behind the walls, out of sight).
When you make a new commit, you have Git store a new snapshot. Git builds the new snapshot from whatever Git has in its index at the time you run git commit
. The new commit's parent is the old tip of the current branch; the new commit becomes the tip of the current branch. This is how branches grow.
The various commands you use, such as git rm
or git add
, operate on the index. You use git checkout branch
to extract the tip commit of a branch into this index. The files stored in the index are hard to see—you can get a complete listing with git ls-files --stage
, but that's rarely very useful. These files, in the index and (eventually) frozen into a commit, are in a special Git-only form anyway. So to work with those files, when you have Git extract the tip commit, you also have Git extract all the files into a work-tree.
The work-tree has those files in their ordinary format, where you, and your computer, can work with them in ordinary ways. But everything you do in this work-tree is really aimed at fussing with the index, because it's the index that contains what goes in the next commit. Running git commit
packages up (freezes) the index contents into a snapshot, and adds that as the new tip of the branch.
Hence, if you compare two commits (with git diff
or git diff-tree
), then make some changes to your index and make a new commit, what you are doing is changing your index and using it to make a new commit. The comparison of the two commits is up to you. Note that you can also compare any one commit to your index, using git diff --cached
(porcelain) or git diff-index --cached
(plumbing). And, you can compare index vs work-tree, or a commit vs work-tree, also using these commands.