Overview
CHARINDEX
is returning the wrong value occasionally when using a collation sequence like:
Latin1_General_CI_AS
but works with a collation sequence like:
SQL_Latin1_General_CP1_CI_AS
This has been encountered on MS SQL Server 2008 R2 and SQL Server 2016.
Examples
Assume the database collation sequence is:
Latin1_General_CI_AS
print CHARINDEX( CHAR(254), 'Tþ' )
-- returns 2 which is Correct print CHARINDEX( CHAR(254), 'Th' )
-- returns 1 which is WRONGprint CHARINDEX( CHAR(253), 'Th' )
-- returns 0 which is Correct print CHARINDEX( CHAR(254) Collate SQL_Latin1_General_CP1_CI_AS, 'Thþ' Collate SQL_Latin1_General_CP1_CI_AS)
-- returns 3 which is Correct print CHARINDEX( CHAR(254) Collate Latin1_General_CI_AS, 'Thþ' Collate Latin1_General_CI_AS)
-- returns 1 which is WRONGIs there a known error with the Latin1...
collation sequences?
This isn't anything specific to SQL Server.
In C#
string.Compare("þ", "th", false, new System.Globalization.CultureInfo(1033))
returns 0
indicating the strings compare equal.
Or in notepad clicking "Replace all" below
leads to
In SQL Server collations that do not start with "SQL" use windows collation rules.
For those in most locales (Iceland being an exception) the thorn character þ
expands to th
.
There is more information about this specific case in this post by Michael S. Kaplan Every rose has it's Þ..... That blog has a wealth of information about windows collations. Expansions are described in more detail here.
If you don't want these semantics you need to use a collation (likely SQL or binary) with the semantics that you do want (perhaps via an explicit collate
clause).