Search code examples
pythonpython-3.xregexpython-re

How to accept an ascii character with python re (regex)


I have a regex that controls a password so that it contains an upper case, a lower case, a number, a special character and minimum 8 characters.

regex is:

regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[\W]).{8,}$"

I use in this function:

def password_validator(password):
    #REGEX PASSWORD : minimum 8 characters, 1 lowercase, 1 uppercase, 1 special caracter
    regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[\W]).{8,}$"

    if not re.match(regex_password, password):
        raise ValueError("""value is not a valid password""")
    return password

However, the use of "²" raises me an error, however, this same regex with a Javascript front-end validation, or on different regex validation site,works.

The problem is possible the ascii, so how can i do for python accept the ascii character in regex ?


Solution

  • From the documentation:

    \W

    Matches any character which is not a word character. This is the opposite of \w. If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_]. If the LOCALE flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore.

    Other implementations may interpret \w as referring to only ASCII alphanumeric characters and underscore by default and \W by extension contains every non-ASCII alphanumeric characters as well as non-alphanumeric characters.

    Possible solutions:

    Spell it out:

    regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z0-9_]).{8,}$"
    

    Or use the re.ASCII flag:

    if not re.match(regex_password, password, flags=re.ASCII):
    

    Either one of these changes should give you the results you need.