Search code examples
language-agnosticfile-type

Method of identifying plaintext files as scripts


I am creating a filter for files coming onto a Unix machine. I only want to allow plain text files that do not look like scripts to pass through.

For checking plain text I am checking the executable bit of the file and using the -T file test from perl. (I understand this is not 100%, but it will catch the binary files I most want to avoid). I think this will be sufficient, but any suggestions are welcome.

My main question is in recognizing when a plain text file is a script. Every script I've ever written has started out with a #! line, so my first thought is to read in the file's first line and block any containing that. Are there common non-script plain text files that start with the #! line that I will flag with a false-positive? Are there better/additional methods of identifying a script?


Solution

  • That's what the file command (see Wikipedia) is for. It recognizes much more than just the she-bang (#!), and can tell you what kind of script it is, if any.