Search code examples
windowsbashgitgit-bashpre-commit-hook

GIT pre-commit hook which searches non-UTF-8 encodings among modified/added files (and rejects commit if it finds any)


I'm using Git for Windows (and TortoiseGit).

My goal is to prevent commits which have at least one non-UTF-8 file among modified/added.

  • Enumerating modified/added files: I've found the following code

    { git diff --name-only ; git diff --name-only --staged ; }
    

    Is this the best (correct and most concise) approach?

  • Searching for non-UTF-8 files: I've found the following code

    { git diff --name-only ; git diff --name-only --staged ; } | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {} - is non-UTF8!"
    

    If I start Git Bash at my repository root folder - it works (each non-UTF-8 file is displayed). So I've renamed .git/hooks/pre-commit.sample to .git/hooks/pre-commit and copy-pasted the code above. After committing changes nothing special displays inside TortoiseGit commit gui window. So looks like pre-commit hook is not working correctly.

  • Rejecting commit if there is any non-UTF-8 file: After displaying all non-UTP-8 files commit should be rejected. But I have no idea how to do this (show some exit code - but how?).

So any help is appreciated.


Solution

  • So the answer is (thanks to phd and great thanks to torek for their useful notes):

        git diff --name-only --staged --diff-filter d | xargs -I {} bash -c 
     "iconv -f utf-8 -t utf-16 {} &>/dev/null || { echo {} - is non-UTF8!; exit 1; }"
    

    This code iterates through all files, that changed in commit (except for deleted - i.e. added, modified, copied and renamed) and checks if there is any non-UTF8 file. All found files are listed and commit is aborted.