Search code examples
sql-serverado.netlocalecollation

SQL Server Collation / ADO.NET DataTable.Locale with different languages


we have WinForms app which stores data in SQL Server (2000, we are working on porting it in 2008) through ADO.NET (1.1, working on porting to 4.0). Everything works fine if I read data previsouly written in Western-European locale (E.g.: "test", "test ù"), but now we have to be able to mix Western and non-Western alphabets as well (E.g.: "test - ۓےۑ" - these are just random arabic chars).

On the SQL Server side, database has been set with the Latin1_General collation, the field is a nvarchar(80). If I run a SQL SELECT statement (E.g.: "SELECT * FROM MyTable WHERE field = 'test - ۓےۑ'", don't mind about the "*" or the actual names) from Query Analyzer, I get no results; the same happens if I pass the Sql statement to an ADO.NET DataAdapter to fill a DataTable. My guess is that it has something to do with collation, but I don't know how to correct this: do I have to change to collation (SQL Server) to a different one? Or do I have to set the locale on the DataAdaoter/DataTable (ADO.NET)?

Thanks in advance to anyone who will help


Solution

  • Yes, the problem is most likely the collation. The Latin1_General collation does not include the rules to sort and compare non latin characters.

    MSDN claims:

    If you must store character data that reflects multiple languages, you can minimize collation compatibility issues by always using the Unicode nchar, nvarchar, and ntext data types instead of the char, varchar, text data types. Using the Unicode data types eliminates code page conversion issues.

    Since you have already complied with this, you should read further on the info about Mixed Collation Environments here.

    Additionally I want to add that just changing a collation is not something done easy, check the MSDN for SQL 2000:

    When you set up SQL Server 2000, it is important to use the correct collation settings. You can change collation settings after running Setup, but you must rebuild the databases and reload the data. It is recommended that you develop a standard within your organization for these options. Many server-to-server activities can fail if the collation settings are not consistent across servers.

    You can specify a collation on a per column bases however:

    CREATE TABLE TestTable (
       id int,  
       GreekColCaseInsensitive nvarchar(10) collate greek_ci_as,
       LatinColCaseSensitive nvarchar(10) collate latin1_general_cs_as
       )
    

    Have a look at the different binary multilingual collations here. Depending on the charset you use, you should find one that fits your purpose.

    If you are not able or willing to change the collation of a column you can also just specify the collation to be used in the query like:

    SELECT * From TestTable 
    WHERE GreekColCaseInsensitive = N'test - ۓےۑ'
    COLLATE latin1_general_cs_as
    

    As jfrobishow pointed out the use of N in front of the string you want to use to compare is essential. What does it do:

    It denotes that the subsequent string is in Unicode (the N actually stands for National language character set). Which means that you are passing an NCHAR, NVARCHAR or NTEXT value, as opposed to CHAR, VARCHAR or TEXT. See Article #2354 for a comparison of these data types.

    You can find a quick rundown here.