Search code examples
git

Is a .gitattributes file really necessary for git?


I've recently been reading up a bit on .gitattributes and also found places like this one, https://github.com/alexkaratarakis/gitattributes, where they try to maintain gitattributes for all file types. However in my mind, looking through those files, I instinctively think this is an unmaintainable mess. It means you'd have to update that file any time you use any new file extension, or any software brings out a new file extension, which is just impossible. When you're working with a team of 30+ people it's just a nightmare to maintain some file like that, we can barely maintain a simple icons.svg file.

But along with that I have been coding and using git for many years, on many different projects, and I've never used .gitattributes. We use things like prettier on our project which rewrites newlines to "lf" and we have devs on windows and things like this never gives any issues, vscode also never gives any issues with things like this. Git also automatically picks up binary files like pngs and automatically shows text differences for files like svg, I've never had to configure that.

So I ask the question, is it really necessary to have this file? Because it seems to me like it's signing up for a ton of maintenance that's completely unnecessary and that git is smart enough to figure out what it should or shouldn't do with a file.


Solution

  • is it really necessary to have this file?

    Yes, for any setting (eol, diff, merge filters, content filters, ...) related to Git you want any collaborator to the repository to follow.

    This differs from git config which, for security reason, remains local (both because it can include sensitive information, or dangerous directives)

    A .gitattributes is part of your versioned source code, and contribute to establishing a common Git standard.
    For instance, I always put (as in VonC/gitcred/.gitattributes):

    *.bat   text eol=crlf
    *.go    text eol=lf
    

    Because no matter how your IDE/editor is configured, I need CRLF for my Windows bat script to properly run, and I prefer LF for Go files, which I edit on Windows or Linux. I always considered local settings like core.autocrlf an antipattern, best left to false.

    But a .gitattributes can declare many other Git elements:

    The .gitattributes file is not "mandatory", but a useful tool in the Git toolbox, one that can be shared safely in a project code base.


    And you can read it even in bare repositories:

    With Git 2.43 (Q4 2023), the attribute subsystem learned to honor attr.tree configuration that specifies which tree to read the .gitattributes files from.

    See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai).
    (Merged by Junio C Hamano -- gitster -- in commit 26dd307, 30 Oct 2023)

    attr: read attributes from HEAD when bare repo

    Signed-off-by: John Cai

    The motivation for 44451a2 (attr: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=<tree>" global option to "git" , 2023-05-06), was to make it possible to use gitattributes with bare repositories.

    To make it easier to read gitattributes in bare repositories however, let's just make HEAD:.gitattributes the default.
    This is in line with how mailmap works, 8c473ce ("mailmap: default mailmap.blob in bare repositories", 2012-12-13, Git v1.8.2-rc0 -- merge).

    And, still with Git 2.43 (Q4 2023):

    See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai).
    (Merged by Junio C Hamano -- gitster -- in commit 26dd307, 30 Oct 2023)

    attr: add attr.tree for setting the treeish to read attributes from

    Signed-off-by: John Cai

    44451a2 (attr: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=" global option to "git", 2023-05-06) provided the ability to pass in a treeish as the attr source.
    In the context of serving Git repositories as bare repos like we do at GitLab however, it would be easier to point --attr-source to HEAD for all commands by setting it once.

    Add a new config attr.tree that allows this.

    git config now includes in its man page:

    attr.tree

    A reference to a tree in the repository from which to read attributes, instead of the .gitattributes file in the working tree.

    In a bare repository, this defaults to HEAD:.gitattributes.

    If the value does not resolve to a valid tree object, an empty tree is used instead.
    When the GIT_ATTR_SOURCE environment variable or --attr-source command line option are used, this configuration variable has no effect.


    However, Git 2.46 (Q3 2024), batch 3 notes:

    Git 2.43 started using the tree of HEAD as the source of attributes in a bare repository, which has severe performance implications.
    For now, revert the change, without ripping out a more explicit support for the attr.tree configuration variable.

    See commit 51441e6 (03 May 2024) by Junio C Hamano (gitster).
    (Merged by Junio C Hamano -- gitster -- in commit b077cf2, 13 May 2024)

    51441e6460:stop using HEAD for attributes in bare repository by default

    With 2386535 ("attr: read attributes from HEAD when bare repo", 2023-10-13, Git v2.43.0-rc0 -- merge listed in batch #22), we started to use the HEAD tree as the default attribute source in a bare repository.
    One argument for such a behaviour is that it would make things like "git archive"(man) run in bare and non-bare repositories for the same commit consistent.
    This changes was merged to Git 2.43 but without an explicit mention in its release notes.

    It turns out that this change destroys performance of shallowly cloning from a bare repository.
    As the "server" installations are expected to be mostly bare, and "git pack-objects"(man), which is the core of driving the other side of "git clone"(man) and git fetch(man) wants to see if a path is set not to delta with blobs from other paths via the attribute system, the change forces the server side to traverse the tree of the HEAD commit needlessly to find if each and every paths the objects it sends out has the attribute that controls the deltification.
    Given that (1) most projects do not configure such an attribute, and (2) it is dubious for the server side to honor such an end-user supplied attribute anyway, this was a poor choice of the default.

    To mitigate the current situation, let's revert the change that uses the tree of HEAD in a bare repository by default as the attribute source.
    This will help most people who have been happy with the behaviour of Git 2.42 and before.

    Two things to note:

    • If you are stuck with versions of Git 2.43 or newer, that is older than the release this fix appears in, you can explicitly set the attr.tree configuration variable to point at an empty tree object, i.e.

      $ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
      
    • If you like the behaviour we are reverting, you can explicitly set the attr.tree configuration variable to HEAD, i.e.

      $ git config attr.tree HEAD
      

    The right fix for this is to optimize the code paths that allow accesses to attributes in tree objects, but that is a much more involved change and is left as a longer-term project, outside the scope of this "first step" fix.