Search code examples
c#.netsql-servercollationstringcomparer

What .NET StringComparer is equivalent SQL's Latin1_General_CI_AS


I am implementing a caching layer between my database and my C# code. The idea is to cache the results of certain DB queries based on the parameters to the query. The database is using the default collation - either SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, which I believe based on some brief googling are equivalent for equality, just different for sorting.

I need a .NET StringComparer that can give me the same behavior, at least for equality testing and hashcode generation, as the database's collation is using. The goal is to be able to use the StringComparer in a .NET dictionary in C# code to determine whether a particular string key is already in the cache or not.

A really simplified example:

var comparer = StringComparer.??? // What goes here?

private static Dictionary<string, MyObject> cache =
    new Dictionary<string, MyObject>(comparer);

public static MyObject GetObject(string key) {
    if (cache.ContainsKey(key)) {
        return cache[key].Clone();
    } else {
        // invoke SQL "select * from mytable where mykey = @mykey"
        // with parameter @mykey set to key
        MyObject result = // object constructed from the sql result
        cache[key] = result;
        return result.Clone();
    }
}
public static void SaveObject(string key, MyObject obj) {
    // invoke SQL "update mytable set ... where mykey = @mykey" etc
    cache[key] = obj.Clone();
}

The reason it's important that the StringComparer matches the database's collation is that both false positives and false negatives would have bad effects for the code.

If the StringComparer says that two keys A and B are equal when the database believes they are distinct, then there could be two rows in the database with those two keys, but the cache will prevent the second one ever getting returned if asked for A and B in succession - because the get for B will incorrectly hit the cache and return the object that was retrieved for A.

The problem is more subtle if the StringComparer says that A and B are different when the database believes they are equal, but no less problematic. GetObject calls for both keys would be fine, and return objects corresponding to the same database row. But then calling SaveObject with key A would leave the cache incorrect; there would still be a cache entry for key B that has the old data. A subsequent GetObject(B) would give outdated information.

So for my code to work correctly I need the StringComparer to match the database behavior for equality testing and hashcode generation. My googling so far has yielded lots of information about the fact that SQL collations and .NET comparisons are not exactly equivalent, but no details on what the differences are, whether they are limited to only differences in sorting, or whether it is possible to find a StringComparer that is equivalent to a specific SQL collation if a general-purpose solution is not needed.

(Side note - the caching layer is general purpose, so I cannot make particular assumptions about what the nature of the key is and what collation would be appropriate. All the tables in my database share the same default server collation. I just need to match the collation as it exists)


Solution

  • Take a look at the CollationInfo class. It is located in an assembly called Microsoft.SqlServer.Management.SqlParser.dll although I am not totally sure where to get this. There is a static list of Collations (names) and a static method GetCollationInfo (by name).

    Each CollationInfo has a Comparer. It is not exactly the same as a StringComparer but has similar functionality.

    EDIT: Microsoft.SqlServer.Management.SqlParser.dll is a part of the Shared Management Objects (SMO) package. This feature can be downloaded for SQL Server 2008 R2 here:

    http://www.microsoft.com/download/en/details.aspx?id=16978#SMO

    EDIT: CollationInfo does have a property named EqualityComparer which is an IEqualityComparer<string>.