Search code examples
sqlruby-on-railsregexpostgresqldata-cleaning

Regex works for form validation but the same regex used in a query returns records that do not match it


I want to prevent my Client records from having a number or special character in it's name or last_name attributes.

I've set up a format validation in my Client model like so:

validates :name,
          format: { with: /\A[a-zÀ-ú\s]+\z/i, message: :wrong_format }
validates :last_name,
          format: { with: /\A[a-zÀ-ú\s]+\z/i, message: :wrong_format }

To my knowledge, this will allow for a string with alphabetic characters including accentuation marks like Á (a-zÀ-ú), plus any white space character (\s), and 0 or more of that combination (+). The /i at the end being for case insensitive.

The validation works fine, preventing form submissions with something like P3ter.

The problem is: using the same regex to query my Postgres DB for records that match it returns records like P3ter. And of course, negating that regex in a query to search for records that do not match it returns no records like P3ter (when I do have them).

select * from clients
where name !~* '[a-zÀ-ú\s]+'
or last_name !~* '[a-zÀ-ú\s]+'

// no results

I've tried a different regex like [[:alpha:][:space:]]+ with the same satisfactory results for form validation but the same weird result when used in a DB query.

How is it that the regex is working for format validations to prevent form submissions with a string like P3ter but the same regex returns records like P3ter when used in a query?


Solution

  • The problem is:

    You didn't add ^ at the beginning of the regex pattern, and $ the end. The regex pattern [a-zÀ-ú\s]+ matches ter of P3ter.

    ^ matches the begin of each line. while $ matches the end of each line.

    [a-zÀ-ú\s] this means that match only one character from a to z or from À to ú or \s which is a whitespace character (equivalent to [\r\n\t\f\v ]).

    [a-zÀ-ú\s]+ this part matches ter of P3ter. The + sign means match the previous token between one and unlimited times which basically means match one or more, at least match one character.

    The solution:

    select * from clients
    where name !~* '^[a-zÀ-ú\s]+$'
    or last_name !~* '^[a-zÀ-ú\s]+$'