Search code examples
findnon-ascii-characters

find files with non-ascii chars in file name


Is there a way I can find files with non-ascii chars? I could use a pipe of course - and filter the files with perl, but for efficiency I'd like to set it all in find. I tried the following:

find . -type f -name '*[^[:ascii:]]*'

it doesn't work at all.

Edit:

I'm now trying to make use of

find . -type f -regex '.*[^[:ascii:]].*'

It is an emacs regexp and it has [:ascii:] class. But the expression I'm trying to use doesn't work.

Edit 2:

LC_COLLATE=C find . -type f -regex '.*[^!-~].*'

matches files with non-ascii chars (a complete voodoo...). But also matches files with a space in the name.


Solution

  • This seems to work for me in both default and posix-extended mode:

    LC_COLLATE=C find . -regex '.*[^ -~].*'
    

    There could be locale-related issues, though, and I don't have a large corpus of non-ascii filenames to test it on, but it catches the ones I have.