Search code examples
gitgitattributescore.autocrlf

How can a file be committed with CRLF on git?


I have a repository containing 5 files that have been committed with CRLF. I don't know how this happened, but on a clean checkout if I use this command it prints 5 files (out of hundreds):

git grep -I --files-with-matches --perl-regexp '\r' HEAD

Does anyone know how can I reproduce this issue? In other words what is a set of git settings that can lead to this situation?


Solution

  • Internally, Git just stores raw data. If you run git hash-object -w you can push any blob data you like into the repository (though you would then need to attach a tag, or add the blob to the index to get it stored into a new commit).

    As I noted in my answer to What does "check out code" mean in git documentation for line endings?, Git will apply CRLF-to-LF-only line-endings translation on any file on which such translations are enabled, at the time you run git add on that file. The result is that the version of the file in the index (or more precisely, the blob hash in the index, representing the in-repo blob object) has LF-only line endings.

    If you run git add on that file with:

    • translations disabled globally, or
    • translations disabled on that particular path name

    then Git won't do those translations, and the index version of the file will have any '\r' characters it had in the work-tree version.

    The settings in .gitattributes and/or core.autocrlf control whether translations are enabled, and if so, which translations to perform. Due to historical settings (from back when Git did nothing at all, to the early stages of adding Windows support, through various intermediate versions of Git, to the current rather complicated .gitattributes method) the rules for all of this are quite complicated.

    In other words what is a set of git settings that can lead to this situation?

    There are many different ways to do it, but the one that's the simplest by far is to write a .gitattributes file with just:

    * -text
    

    or to set core.autocrlf to false (but note that .gitattributes overrides core.autocrlf, in general). Now Git will treat all files as binary, doing no "cleaning" during git add and no "smudging" during git checkout. The work-tree contents will now match the index contents byte-for-byte, except for any changes you make yourself, or make by running programs, to work-tree files. You can then git add those new files to the index and it will copy them in, byte-for-byte; and each new git commit you make will use what's in the index.

    Once you have stored, as permanent and unchangeable commits, the particular versions of particular files you care about, you can modify .gitattributes to contain any other settings you would like to test, and run git checkout <commit> -- <path> to make Git copy the file from a commit, to the index, through the smudging filters, and into the work-tree. You can modify any work-tree file any way you like, then run git add <path> to run the file through the cleaning filters to copy it into the index. These filters will be controlled by whatever you have in .gitattributes at the time you run the commands, so you can experiment with different attributes without having to make new commits.