I have a repository containing 5 files that have been committed with CRLF. I don't know how this happened, but on a clean checkout if I use this command it prints 5 files (out of hundreds):
git grep -I --files-with-matches --perl-regexp '\r' HEAD
Does anyone know how can I reproduce this issue? In other words what is a set of git settings that can lead to this situation?
Internally, Git just stores raw data. If you run git hash-object -w
you can push any blob data you like into the repository (though you would then need to attach a tag, or add the blob to the index to get it stored into a new commit).
As I noted in my answer to What does "check out code" mean in git documentation for line endings?, Git will apply CRLF-to-LF-only line-endings translation on any file on which such translations are enabled, at the time you run git add
on that file. The result is that the version of the file in the index (or more precisely, the blob hash in the index, representing the in-repo blob object) has LF-only line endings.
If you run git add
on that file with:
then Git won't do those translations, and the index version of the file will have any '\r'
characters it had in the work-tree version.
The settings in .gitattributes
and/or core.autocrlf
control whether translations are enabled, and if so, which translations to perform. Due to historical settings (from back when Git did nothing at all, to the early stages of adding Windows support, through various intermediate versions of Git, to the current rather complicated .gitattributes
method) the rules for all of this are quite complicated.
In other words what is a set of git settings that can lead to this situation?
There are many different ways to do it, but the one that's the simplest by far is to write a .gitattributes
file with just:
* -text
or to set core.autocrlf
to false
(but note that .gitattributes
overrides core.autocrlf
, in general). Now Git will treat all files as binary, doing no "cleaning" during git add
and no "smudging" during git checkout
. The work-tree contents will now match the index contents byte-for-byte, except for any changes you make yourself, or make by running programs, to work-tree files. You can then git add
those new files to the index and it will copy them in, byte-for-byte; and each new git commit
you make will use what's in the index.
Once you have stored, as permanent and unchangeable commits, the particular versions of particular files you care about, you can modify .gitattributes
to contain any other settings you would like to test, and run git checkout <commit> -- <path>
to make Git copy the file from a commit, to the index, through the smudging filters, and into the work-tree. You can modify any work-tree file any way you like, then run git add <path>
to run the file through the cleaning filters to copy it into the index. These filters will be controlled by whatever you have in .gitattributes
at the time you run the commands, so you can experiment with different attributes without having to make new commits.