Search code examples
phphtmlformstextareablacklist

Detecting specific words in a textarea submission


I have a new feature on my site, where users can submit any text (I stopped all HTML entries) via a textarea. The main problem I still have though is that they could type "http://somewhere.com" which is something I want to stop. I also want to blacklist specific words. This is what I had before:

if (strpos($entry, "http://" or ".com" or ".net" or "www." or ".org" or ".co.uk" or "https://") !== true) {
            die ('Entries cannot contain links!');

However that didn't work, as it stopped users from submitting any text at all. So my question is simple, how can I do it?


Solution

  • This is a job for Regular Expressions.

    What you need to do it something like this:

    // A list of words you don't allow
    $disallowedWords = array(
      'these',
      'words',
      'are',
      'not',
      'allowed'
    );
    // Search for disallowed words.
    // The Regex used here should e.g. match 'are', but not match 'care' or 'stare'
    foreach ($disallowedWords as $word) {
      if (preg_match("/\s+$word\s+/i", $entry)) {
        die("The word '$word' is not allowed...");
      }
    }
    
    // This variable should contain a regex that will match URLs
    // there are thousands out there, take your pick. I have just
    // used an arbitrary one I found with Google
    $urlRegex = '(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*';
    
    // Search for URLs
    if (preg_match($urlRegex, $entry)) {
      die("URLs are not allowed...");
    }