Search code examples
sqlsql-serverdiacriticspatindex

PATINDEX with letter range exclude diacritics (accented characters)


I am trying to figure out how to use a patindex to find a range of letter characters, but exclude accented characters. If I do a straight search, using the default collate (insensitive) works just fine. However, when I search a range of letters, it will match on the accented character

SELECT
    IIF('Ú' = 'U' COLLATE Latin1_General_CI_AI, 'Match', 'No') AS MatchInsensitive,
    IIF('Ú' = 'U' COLLATE Latin1_General_CI_AS, 'Match', 'No') AS MatchSensitive,
    PATINDEX('%[A-Z]%', 'Ú' COLLATE Latin1_General_CI_AI)      AS PIInsensitive,
    PATINDEX('%[A-Z]%', 'Ú' COLLATE Latin1_General_CI_AS)      AS PISensitive

Will give the following results:

MatchInsensitive MatchSensitive PIInsensitive PISensitive
---------------- -------------- ------------- -----------
Match            No             1             1

What I am really trying to do is to identify the character position of accented characters in a string, so I was really searching for PATINDEX('%[^A-Z0-9 ]%').

If I have the following query, I would expect a result of 2 SELECT PATINDEX('%[^A-Z0-9 ]%', 'médico'), but I get 0.


Solution

  • You could use a binary collation, e.g. Latin1_General_100_BIN2.

    select patindex('%[^a-zA-Z0-9 ]%', 'médico' collate Latin1_General_100_BIN2)
    

    rextester: http://rextester.com/ZICLN98474

    returns 2