I am creating a filter for files coming onto a Unix machine. I only want to allow plain text files that do not look like scripts to pass through.
For checking plain text I am checking the executable bit of the file and using the -T file test from perl. (I understand this is not 100%, but it will catch the binary files I most want to avoid). I think this will be sufficient, but any suggestions are welcome.
My main question is in recognizing when a plain text file is a script. Every script I've ever written has started out with a #!
line, so my first thought is to read in the file's first line and block any containing that. Are there common non-script plain text files that start with the #!
line that I will flag with a false-positive? Are there better/additional methods of identifying a script?
That's what the file
command (see Wikipedia) is for. It recognizes much more than just the she-bang (#!
), and can tell you what kind of script it is, if any.