Search code examples
gitlaravelgithubgitattributesgithub-linguist

.gitattributes linguist attributes standard


I've just created a new Laravel project, and I'm setting up my .gitattributes Linguist attributes. Problem is, I'm not sure which ones to set for which files/directories.

The default .gitattributes file looks like this :

* text=auto
*.css linguist-vendored
*.scss linguist-vendored
*.js linguist-vendored
CHANGELOG.md export-ignore

First of all, why are all .css, .scss and .js files set to linguist-vendored ? Not only would this exclude all such user-created files from Linguist, but Laravel also only has respectively 1, 2 and 4 of these files, how would this significantly impact Linguist stats ?

Secondly, is there any convention/standard regarding which files should be marked as linguist-vendored ? Should I mark all Laravel files, only the ones that I don't modify, only the vendor directory, or none at all ? Or maybe even something else ?

Thanks in advance !


Solution

  • First of all, why are all .css, .scss and .js files set to linguist-vendored?

    From what I understand reading comments on commits e3630a5 and 93876d6, the authors wanted to ensure Laravel projects are tagged as PHP (GitHub tags projects with the first language from the language statistics).

    Not only would this exclude all such user-created files from Linguist, but Laravel also only has respectively 1, 2 and 4 of these files, how would this significantly impact Linguist stats?

    In Linguist, language statistics are derived from the size of files for each language (you can read my answer on how Linguist works for further details). Thus, a single large file can change language statistics.

    In Laravel's case, if I remove the Linguist overrides and run Linguist again on the repository, I get the following statistics:

    94.11%  PHP
    3.30%   HTML
    1.42%   JavaScript
    0.68%   Vue
    0.49%   CSS
    

    So there's no need to use Linguist overrides to have the repository tagged as PHP. It's possible that there were larger files when the overrides were added though.

    Note that the Laravel repository contains two large minified files. Linguist recognizes that there are minified and marks them as generated, thus excluding them from statistics automatically.

    Secondly, is there any convention/standard regarding which files should be marked as linguist-vendored? Should I mark all Laravel files, only the ones that I don't modify, only the vendor directory, or none at all? Or maybe even something else?

    That's really up to you; as far as I know, there's no convention on what constitutes vendored code.

    In Linguist, we try to mark as vendored all third party code that may affect the statistics but wasn't authored by the repository's owner. You can change the default behavior with overrides though.