I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:
#!/bin/bash
FILES=`ls`
for i in $FILES
do
########GREP SYNTAX###########
if grep -qv -e[:cntrl:] $i
########/GREP SYNTAX##########
then
mv $i $i-plaintext.txt
else
mv $i $i-binary.txt
fi
done
In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.
I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.
I've been bashing my head against a wall trying to sort this for half a day. Help! Thanks in advance, Rik
First you can/should do
for f in *
instead of putting the output of ls
in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.
Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use -v
and negate the print
class and see if that works for you.
if grep -aq -e '[^[:print:]]' "$f"
And as shown in that line, always quote variables when they contain filenames.
mv "$f" "$f-plaintext.txt"
To keep grep
from complaining about binary files, use -a
.
The variable i
is often used for an integer or an index. Use f
or file
.
Finally:
#!/bin/bash
for f in *
do
if grep -aq -e '[^[:print:]]' "$f"
then
mv "$f" "$f-binary.txt"
else
mv "$f" "$f-plaintext.txt"
fi
done