Search code examples
postgresqlnon-ascii-characters

How does one match alphabets of different languages in PostgreSQL?


How to find sequence of Alphabets of non-ASCII (other languages) in a given string in PostgreSQL? For example, ASCII alphabets can be matched using '[A-Za-z]'.

In SQL Server, @ch BETWEEN 'A' and 'Z' matches the characters like Ñ, ü, Ä, etc.


Solution

  • That depends on the collation you are using. With most natural language collations, the comparison would work:

    SELECT 'ñ' COLLATE "en-US-x-icu" BETWEEN 'A' AND 'Z';
    
     ?column? 
    ══════════
     t
    (1 row)
    

    The easiest way to check if a string contains only alphabetic characters is a regular expression:

    SELECT NOT '中文µxY' COLLATE "de_AT.utf8" ~ '[^[:alpha:]]';
    
     ?column? 
    ══════════
     t
    (1 row)
    
    SELECT NOT 'a+b' COLLATE "de_AT.utf8" ~ '[^[:alpha:]]';
    
     ?column? 
    ══════════
     f
    (1 row)