Search code examples
c#asp.netsql-server-2005globalizationcjk

Allowing Simplified Chinese Input


The company I work for is bidding on a project that will require our eCommerce solution to accept simplified Chinese input. After doing a bit of research, it seems that ASP.net makes globalization configuration easy:

<configuration>
  <system.web>
    <globalization
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="zh-Hans"
      uiCulture="en-us" />
  </system.web>
</configuration>

Questions:

  1. Is this really all there is to it in ASP.net? It seems to good to be true.
  2. Are there any DB considerations with SQL Server 2005? Will the DB accept the simplified Chinese without additional configuration?

Solution

  • Ad 1. The real question is, how far you want to go with Internationalization. Because i18n is not only allowing Unicode input. You need at least support local date, time and number formats, local collation (mostly related to sorting) and ensure that your application runs correctly on localized Operating Systems (unless you are developing Cloud aka hosted solution). You might want to read more on the topic here.

    As far as support for Chinese character input goes, if you are going to offer software in China, you need to at least support GB18030-2000. To do just that, you need to use proper .Net Framework version - the one that supports Unicode 3.0. I believe it was supported since .Net Framework 2.0.
    However, if you want to go one step further (which might be required for gaining competitive edge), you might want to support GB18030-2005. The only problem is, the full support for these characters (CJK Unified Ideographs Extension B) happened later (I am not really sure if it is Unicode 6.0 or Unicode 6.1) in the process. Therefore you might be forced to use the latest .Net Framework and still not be sure if it covers everything.
    You might want to read Unicode FAQ on Han characters.

    Ad 2. I strongly advice you not to use SQL Server 2005 with Chinese characters. The reason is, old SQL Server engine supports only UCS-2 rather than UTF-16. This might seems as slight difference, but that really poses the problem with 4-byte Han Ideographs. Actually, you want be able to use them in queries (i.e. LIKE or WHERE clauses) - you will receive all records. That's how it works. And to support them, you would need to set very specific Chinese collation, which will simply break support for other languages.
    Basically, using SQL Server 2005 with Chinese Ideographs is a bad idea.