Search code examples
bashgrepnon-printable

Locate files which ONLY contain printable characters in bash script


I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.

I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.

I've been bashing my head against a wall trying to sort this for half a day. Help! Thanks in advance, Rik


Solution

  • First you can/should do

    for f in *
    

    instead of putting the output of ls in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.

    Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use -v and negate the print class and see if that works for you.

    if grep -aq -e '[^[:print:]]' "$f"
    

    And as shown in that line, always quote variables when they contain filenames.

    mv "$f" "$f-plaintext.txt"
    

    To keep grep from complaining about binary files, use -a.

    The variable i is often used for an integer or an index. Use f or file.

    Finally:

    #!/bin/bash
    for f in *
    do
        if grep -aq -e '[^[:print:]]' "$f"
        then
            mv "$f" "$f-binary.txt"
        else
            mv "$f" "$f-plaintext.txt"
        fi
    done