Search code examples
phpregexvalidationpreg-matchwhitelist

Validate that a string only contains whitelisted characters


I am trying to put a regex match in PHP to find whether the string contains any other invalid characters other than the following characters,

~!@#$%^&*()_-+=\}]{[::'"/>.<,
alpha, space, numeric

I want to print the string if it contains any other character other than the previously mentioned characters

ßab? - Invalid
Ba,-  - Valid

I tried using preg_match() with few inputs but unable to complete it.


Solution

  • Update

    This expression will match the negative set of your valid range:

    $valid = preg_quote('~!@#$%^&*()_-+=\}]{[::\'"/>.<,', '/');
    
    if (preg_match("/[^$valid\d\sA-Za-z]/", $invalid)) {
            echo "invalid chars\n";
    }
    

    I'm using preg_quote() here to make sure all characters are properly escaped.

    Old answer

    I calculated the negative set based on your question:

    if (preg_match('/[\x00-\x08\x0c\x0e-\x1f\x3b\x3f\x60\x7c\x7f-\xff]/', $str)) {
            echo "matches invalid chars\n";
    }
    

    To arrive at this set, you can use this code:

    $s = '~!@#$%^&*()_-+=\}]{[::\'"/>.<,'
            . join('', range('a', 'z'))
            . join('', range('A', 'Z'))
            . join('', range('0', '9'))
            . " \t\r\v\n";
    
    $missing = count_chars($s, 2);
    print_r($missing);
    

    It prints an array of ordinal character codes that's not inside $s; with that you can generate above pattern.