Search code examples
mysqlregexp-substr

Mysql regexp fails


I'm trying to find any records in table customers where the name does not contain letters. The following is what I am using. When ran, it doesn't return any records found. Would someone point out my mistake, please?

    table customers {
    name = Еarnings on thе Intеrnet from
    }

    SELECT name from customers WHERE name NOT REGEXP '[a-zA-Z]' ; 

Solution

  • If I compare a string using REGEXP '[a-zA-Z]', it matches if any character in the string matches a letter in that regexp. The string you show does contain letters like a, r, n, i, g, s. Any one of those is enough to satisfy the REGEXP comparison.

    mysql> select 'Еarnings on thе Intеrnet from' REGEXP '[a-zA-Z]' as ok;
    +----+
    | ok |
    +----+
    |  1 |
    +----+
    1 row in set (0.00 sec)
    

    The negation NOT REGEXP is the same as NOT (expr REGEXP pattern). It simply reverses the result 1 for 0, or 0 for 1.

    mysql> select 'Еarnings on thе Intеrnet from' NOT REGEXP '[a-zA-Z]' as ok;
    +----+
    | ok |
    +----+
    |  0 |
    +----+
    

    You said you want to match names that do not contain letters. What I think you mean is that you want to match names that contain any characters that are not letters, which is a different test.

    mysql> select 'Еarnings on thе Intеrnet from' REGEXP '[^a-zA-Z]' as ok;
    +----+
    | ok |
    +----+
    |  1 |
    +----+
    

    The characters Ð • µ are not in the range [a-zA-Z] and the way to express the complementary range of characters is to use [^a-zA-Z]. That is, with the ^ character inside the square brackets, a character would match if it is NOT one of the characters in that range.

    See also https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax under the item:

    • [a-dX], [^a-dX]

    Re your comment:

    I tested checking for the characters you mention, including space, apostrophe, dot, and dash:

    mysql> select 'Mr. Tim O''Toole' regexp '[^a-zA-Z \'.-]' as ok;
    +----+
    | ok |
    +----+
    |  0 |
    +----+
    
    mysql> select 'Mr. Tim $ O''Toole' regexp '[^a-zA-Z \'.-]' as ok;
    +----+
    | ok |
    +----+
    |  1 |
    +----+
    

    You don't need to put a backslash before the -, but it does need to be handled specially:

    To include a literal - character, it must be written first or last.

    This is in the documentation I linked to.