Search code examples
postgresqlpostgresql-9.3

Postgresql constraint to check for non-ascii characters


I have a Postgresql 9.3 database that is encoded 'UTF8'. However, there is a column in database that should never contain anything but ASCII. And if non-ascii gets in there, it causes a problem in another system that I have no control over. Therefore, I want to add a constraint to the column. Note: I already have a BEFORE INSERT trigger - so that might be a good place to do the check.

What's the best way to accomplish this in PostgreSQL?


Solution

  • You can define ASCII as ordinal 1 to 127 for this purpose, so the following query will identify a string with "non-ascii" values:

    SELECT exists(SELECT 1 from regexp_split_to_table('abcdéfg','') x where ascii(x) not between 1 and 127);
    

    but it's not likely to be super-efficient, and the use of subqueries would force you to do it in a trigger rather than a CHECK constraint.

    Instead I'd use a regular expression. If you want all printable characters then you can use a range in a check constraint, like:

    CHECK (my_column ~ '^[ -~]*$')
    

    this will match everything from the space to the tilde, which is the printable ASCII range.

    If you want all ASCII, printable and nonprintable, you can use byte escapes:

    CHECK (my_column ~ '^[\x00-\x7F]*$')
    

    The most strictly correct approach would be to convert_to(my_string, 'ascii') and let an exception be raised if it fails ... but PostgreSQL doesn't offer an ascii (i.e. 7-bit) encoding, so that approach isn't possible.