Search code examples
postgresqltypesuser-defined-data-types

Enforce printable characters only in text fields


The text data type in PostgreSQL database (encoding utf-8) can contain any UTF-8 character. These include a number of control characters (https://en.wikipedia.org/wiki/Unicode_control_characters)

While I agree there are cases when the control characters are needed, there is little (to none) of use of these characters in normal attributes like persons name, document number etc. In fact allowing such characters to be stored in DB can lead to nasty problems as the characters are not visible and the value of the attribute is not what it seems to be to the end user.

As the problem seems to be very general, is there a way to prevent control chars in text fields? Maybe there is a special text type (like citext for case-incencitive text)? Or should this behaviour be realized as a domain? Are there any other options? All I could find people talk is finding these characters using regex.

I could not find any general recommendations to solving the problem so maybe I'm missing something obvious here.


Solution

  • The exact answer will depend on what you consider printable.

    However, a domain is the way to go. If you want to go with what your database collation considers a printable character, use a domain like this:

    CREATE DOMAIN printable_text AS text CHECK (VALUE !~ '[^[:print:]]');
    
    SELECT 'a'::printable_text;
     printable_text 
    ════════════════
     a
    (1 row)
    
    SELECT E'\u0007'::printable_text;  -- bell character (ASCII 7)
    ERROR:  value for domain printable_text violates check constraint "printable_text_check"