Search code examples
bashgitfilteringgitignorepre-commit-hook

From a shell script, how can I filter out files matching .gitignore?


I'm maintaining some git pre-commit hooks, and I keep wanting to do something for all files that are or should be under revision control. Knowing the project structure lets me do a decent job of this, but all the info about build system output directories, test log files, and editor droppings is already in .gitignore.

Is there a simple way to filter file paths based on whether they match a pattern in .gitignore.

Iow, what can I substitute for WHAT GOES HERE in

find "$(git rev-parse --show-toplevel)" --my --filters | WHAT GOES HERE

so that I get all and only un-gitignored files that match my filters.

I think I can get a negative filter that I might tee into comm by doing

... | xargs git ls-files -X .gitignore -i

but I was hoping for a single step.


Solution

  • UPDATE - As noted in an exchange of comments on this answer, the check-ignore command says that it lists ignored files, but in the event your ignore rules include exceptions (patterns that start with !), files matching those patterns are printed as well even though the file is not ignored. While some of the docs can be read as describing this behavior, other parts of the same docs strongly imply that it's not what's intended - so I regard it as a bug, but regardless of such interpretations, it's how the software works.

    So... If you don't use ! patterns, the below works as advertised. If you do use ! patterns, then you could work around this by using --verbose output and post-processing to see if a matching pattern is an inclusion or an exclusion.


    Getting the exact behavior you want with ls-files may not be as easy as it seems. To start, you probably don't mean -i since that would only list ignored files...

    But anyhow, a different (more "one-step") approach would be:

    In your find command, you can use an -exec action to call git check-ignore for each file matching your other filters.

    find "$(git rev-parse --show-toplevel)" <filters> -not -exec git check-ignore -q {} \; <actions>
    

    This will properly interpret the ignore rules from all sources.

    By default that also means that if a file is in the index, it does not show up as "excluded" even if it's in .gitignore, which reflects how ignore rules really behave.

    But if you want to not process files matching the ignore pattern even though they're in the index and therefore are not really ignored, you can modify the command to do that:

    find "$(git rev-parse --show-toplevel)" <filters> -not -exec git check-ignore -q --no-index {} \; <actions>
    

    Since you started from using find, I'm assuming you only care about files that are currently in your work tree in any case.

    You may also want to exclude the .git directory. If .git is the only "dot-file" in your top-level directory, you could say

    find "$(git rev-parse --show-toplevel)"/* <filters> -not -exec git check-ignore -q --no-index {} \; <actions>
    

    If you can't make that assumption, then you could

    find "$(git rev-parse --show-toplevel)" -path "$(git rev-parse --show-toplevel)"/.git -prune -o <filters> -not -exec git check-ignore -q -no-index {} \; <actions>
    

    which is a bit ugly due to the two calls to rev-parse. You could instead capture the rev-parse result to an environment variable before running find, but that may run afoul of your "one step" preference. Another option, if you can safely ignore any directory named .git

    find "$(git rev-parse --show-toplevel)" -path */.git -prune -o <filters> -not -exec git check-ignore -q -no-index {} \; <actions>