I have a weird problem with the git log command. Although this command:
git log --pretty=format: --name-only --diff-filter=A
returns .xyz.yml file in the list, but when I try to run this command:
git log --pretty="%ad" --diff-filter=A -- .xyz.yml
to retrieve the time this file was added to this repository it returns empty.
Is there any solution for it?
I would be grateful for any kind of clue.
Edit:
When I try to get full history:
git log --full-history -- .xyz.yml
the output shows a brief and incomplete history
commit b26d833b9da805d5d58c429a4af2d1a5c5b0bad9
Author: author name
Date: Mon Dec 19 14:07:17 2016 -0500
Code config (#606)
* Create .xyz.yml
Created a Code config that uses your setup (also, enabled our Duplication engine).
* Update .xyz.yml
* Update .xyz.yml
* Update .xyz.yml
* Update .xyz.yml
* Update .xyz.yml
* Update .xyz.yml
* Update .xyz.yml
Even though the file is no longer present in the head, the history does not show any deletion...
I have also had a look at the commit history in the GitHub user interface and I see a whole different world there:
The first commit date is even different than what I can find in --full-history.
(Note: if you don't already know how commits store full snapshots of every file, and link together through their metadata parent information, see, e.g., my answer here.)
The question from before and after the edit is different, but both questions are related in terms of what's going on. The git log
command can perform something Git calls History Simplification. Search the git log
documentation for this two-word phrase, and you will find a section that begins with this strangely-worded paragraph:
Sometimes you are only interested in parts of the history, for example the commits modifying a particular <path>. But there are two parts of History Simplification, one part is selecting the commits and the other is how to do it, as there are various strategies to simplify the history.
Before we tackle the odd wording here, note that history simplification only occurs if you ask for it. The ways to ask for it are:
path
arguments, as in git log -- .xyz.yml
: the .xyz.yml
here is a path. (The --
is optional in some cases, and marks the remainder of arguments as paths. If the named paths exist in the current commit and do not resemble other git log
options, the --
is not required. It's a good idea to get in the habit of using it always, though, so that you don't have to figure out whether it's required for this particular git log
invocation.)Since in your troublesome case, you did use -- .xyz.yml
, you did ask for History Simplification, even if you did not realize that you asked for it. That's why I added my comment; your reply that using --full-history
fixed the problem proved that the default-mode simplification was in fact the problem.
You then asked:
full history returns introduction commit. Whats the reason?
The answer lies in what the documentation calls Default mode
:
Simplifies the history to the simplest history explaining the final state of the tree. Simplest because it prunes some side branches if the end result is the same (i.e. merging branches with the same content)
This is still rather inexplicable though. The initial paragraph talks about selecting the commits and then how to do it. I think what is missing here is that the documentation never talks about how git log
really works.
What we need to know—that the documentation fails to say—is that the way git log
works is to scan through a queue of commits. This queue is a priority queue, i.e., a "higher priority" commit floats up to the front of the queue and gets examined first; a "lower priority" commit that is already in the queue gets pushed towards the back of the line by this higher priority commit. The git log
command thus handles just one commit at a time out of this queue.
The queue itself is loaded, initially, from any commits you specify on the command line. For instance, you can run:
git log branch1 branch2 branch3
This uses git rev-parse
to turn each of branch1
, branch2
, and branch3
into a commit hash ID. The resulting three commit hash IDs—assuming we get three different ones—are loaded into the queue. If we get duplicates, the queue has two or even just one commit hash ID in it, at this point. For instance, if the names branch2
and branch3
select the same commit, while branch1
selects a different commit, the queue now has just two commits in it.
(If you don't pick any starting commit, git log
will use HEAD
as the starting point commit. Its sister command, git rev-list
, doesn't have this particular feature, so any time you use git rev-list
instead of git log
, make sure you give it an explicit starting point.)
The git log
code now enters its main loop. This loop:
git log
arguments; andgit log
arguments.When we ask git log
to say things about a file like .xyz.yml
, the decision about whether to print the commit has to compare the commit's snapshot to its parent's snapshot. We now want to scan down a bit in the documentation to this section:
A more detailed explanation follows.
Suppose you specified
foo
as the <paths>. We shall call commits that modifyfoo
!TREESAME, and the rest TREESAME. (In a diff filtered forfoo
, they look different and equal, respectively.) [snip]
(Read the rest and work through their example, too, either before or after reading the rest of this answer.)
What Git is really going to do, internally, is take the snapshot for this commit—whatever it is—and strip away all files except those you listed. In this case, the one file you listed is .xyz.yml
; in their example, the one file is named foo
instead. But you can give a directory path here, and Git will strip away all files except those that are in that directory, or multiple paths, and Git will strip away all but those paths, too. This all works for the so-called TREESAME test. It's just easiest to understand when we're looking at one single file, because either the commit has some particular version of the file, or the commit lacks the file entirely: those are the only two possibilities. So two commits are "the same" (TREESAME) if both lack the file, or if both have the file and use the same version of the file.
If we have a normal, everyday, non-merge commit with a single parent commit, this is all pretty straightforward. Consider the following simple chain of commits:
... <-F <-G <-H
Here, commit H
has some snapshot. H
's parent, commit G
, has some snapshot. G
's parent F
has some snapshot too, of course, and so on down the line. Probably each snapshot is different, but if we strip them down to just one file of interest—file foo
, or file .xyz.yml
—commit G
and H
may have the same file. G
and H
are TREESAME to each other. The copy in commit F
, however, might be different: F
and G
are not TREESAME.
What this means is that Git won't mention commit H
. It has no change to the file. Git will mention commit G
: it has a change to the file, as compared to its parent F
. This is the first use of the TREESAME concept: by asking Git about particular files, it only prints commits that are not TREESAME to their parent commit: that at least one of the files we're asking about, changed.
This only handles simple, ordinary commits like F
, G
, and H
. What about merge commits? Our branch might have these commits in it:
I--J
/ \
...--H M--N--...
\ /
K--L
When Git is doing the TREESAME test for the (M, N) pair, that part is straightforward. Although M
is a merge commit, it has a snapshot, just like any commit. So we reduce the snapshots in M
and N
to the file(s) of interest and decide whether the result is TREESAME. If so, we don't print N
, and move on to M
; if not, we do print N
, and move on to M
.
Now we have to decide if commit M
is TREESAME to its parent. But hang on, M
does not have a parent. M
has two parents, J
and L
. Which one should we compare?
Git's answer is to compare all of them: to try a TREESAME(J
, M
) and a TREESAME(L
, M
). Git now knows whether M
is TREESAME to all parents, or to some parents, or to no parents. If M
is TREESAME to any parent, it is not printed; otherwise, it is printed. Now the real complication sets in.
git log
can put some or all parents into the queueHaving printed or not printed commit M
, git log
must now decide:
J
into the queue?L
into the queue?When not doing history simplification, Git will put both parents into the queue. (Well, not if you used the --first-parent
option. But, since you didn't, we'll just ignore the option entirely.) But when doing history simplification, the default option is:
--full-history
, all parents go into the queue.--full-history
, pick one parent that is TREESAME (chosen at random from all possible TREESAME parents). If no parent is TREESAME, pick all parents. Put these into the queue.(Note that some merge commits might have 3 or more parents; the same rules apply to these many-parent merge commits. Here we only have a two-parent merge, so the phrase "all parents" means "both parents".)
Now, suppose our file-of-interest is introduced in commit K
or L
. It's absent from commits H
, I
, and J
, and—importantly—it's absent in M
as well: the merge omits the file. Since commit M
, after stripping away all but our one file-of-interest, is TREESAME to commit J
after the same stripping-away, Git follows M
back to J
, completely ignoring commit L
. (Note: it could be that our file is also absent from L
, but for whatever reason, Git chooses to follow commit J
instead of L
as its single TREESAME commit, while doing history simplification.)
In this case, the history simplification code completely stops git log
from looking at the bottom row of commits. The queue never contains commit K
, where the file is first introduced. A scan to find the file never finds it, because Git never peruses the history—the commit—in which the file is introduced.
The idea behind this simplification is to explain why you have the files you have now. By not following the history from merge commit M
back to commit L
, in our example, we never find the file .xyz.yml
. But that's what we want, because file .xyz.yml
is not in the current commit N
, or wherever it is that we started. We've asked Git to explain the files that are there. File .xyz.yml
isn't there and therefore the explanation as to why it's there is that it was never there in the history-of-interest: the history that explains why it's still not there.
Your goal, of course, is to figure out where it was introduced and where it was lost. The fact is that it was lost at a merge, when someone decided: We don't need this stupid .xyz.yml
in my merge result! Let's keep it out! That is, it was in some commit L
, and isn't in its immediate successor merge M
.
The way I know this is from your final git log
output, when you took out the --diff-filter
option:
git log --full-history -- .xyz.yml
We see a commit that adds the file, and some commits that modify it, but we don't see any commit that deletes it. The reason we do not see this commit is because our merge M
is TREESAME to at least one of its parents: J
, in our example. So merge commit M
is simply not printed.
If it were printed, we'd still have a potential issue, because the way a merge is printed is a little funky. All commits have their hash ID and log message printed. If you ask for --name-status
or --patch
, you may also get the result of a git diff
of some sort. For ordinary commits, this is a diff against the (single) parent commit. For git log
, though, there's a problem:
-c
, --cc
, or -m
, git log
lazily skips printing the diff entirely; and-c
or --cc
, you get a combined diff.Combined diffs omit some files. In particular, they omit any file where at least one parent and the merge have the same version of the file—or in this case, both lack the file. So a combined diff won't mention that between L
and M
, the file got deleted. Only the -m
style diff will mention the deletion here.
The -m
option does a "virtual split" of the merge commit. If merge X
has parents P1
, P2
, P3
, ..., Pn
, you get n diffs: P1
-vs-X
, P2
-vs-X
, P3
-vs-X
, and so on up through Pn
-vs-X
. For our particular case, then, we would get two diffs: J
vs M
, and L
vs M
. The J
-vs-M
diff would show nothing at all for .xyz.yml
, but the L
-vs-M
would show the deletion.
(Note that -m
also modifies the way git log
decides whether to print the merge at all: now that it's been split, it gets printed if it's not TREESAME to at least one parent. That's important too, here.)
If you're trying to figure out where some file got deleted, you may need git log --full-history --diff-filter=D -m -- path
. This forces git log
to go through all parents of each merge and to inspect to see whether the merge itself is the reason the file doesn't exist in the commit from which you're starting.