Search code examples
phpregexescapingcharacter-class

Regex not working as intended with hyphen in a character class


I have this code, but is not working as I expect.

If I write #$% or textwithouterrors the message showed is "correct". So, the regex is not working.

I need to avoid, special characters, spaces and numbers

function validateCity($form) {
    if (preg_match("/[\~!\@#\$\%\^\&*()_+\=-[]{}\\|\'\"\;\:\/\?.>\,\<`]/", $form['city'])) {
        echo ("error");
        return false;
    } else {
        echo("correct");
        return true;
    }
}
validateCity($form);

Solution

  • There are a couple of issues going on here. The most serious one is that you have syntax errors in your regex: So far, I've noticed [, ], and - all unescaped in your character class. I'm a little surprised the regex engine isn't erroring out from those, since they technically lead to undefined behavior, but PHP tends to be pretty tolerant of such things. Either way, it isn't doing what you think it is.

    Before worrying about that, address the second issue: You're blacklisting characters, but you should just use a whitelist instead. That will simplify your pattern considerably, and you won't have to worry about crazy characters like ▲ slipping past your regex.

    If you're trying to match cities, I'd go with something like this:

    if(preg_match("/[^\p{L}\s-]/", $form['city'])) {
        echo ("error");
        return false;
    }
    //etc...
    

    That will allow letters, dashes (think Winston-Salem, NC), and whitespace (think New Haven, CT), while blocking everything else. This might be too restrictive, I don't know; anyone who knows of a town with a number in the name is welcome to comment. However, the \p{L} should match unicode letters, so Āhualoa, HI should work.