Search code examples
c#sql-serverwindowsfilesystemsculture

Windows Invariant Culture Puzzle


I have a question about the windows invariant culture.

Succinctly, my question is:

does there exist any pair of characters c1, and c2 such that:

lower(c1, invariant) =latin-general lower(c2, Invariant)

but

lower(c1, invaraint) !=invariant lower(c2, invariant)

Background:

I need to store an invariant lower case string (representing a file name) inside of SQL Server Compact, which does not support windows invariant collations.

Ideally I would like to do this without having to pull all of my comparison logic out of the database and into my app.

The idea I had for solving this was to store 2 versions of all file names: one that is used for displaying data to the customer, and another that is used for performing comparisons. The comparison column would be converted to lower case using the windows invariant locale before storing it in the database.

However, I don't really have any idea what kind of mappings the invariant culture does, other than the fact that its what windows uses for comparing file names.

I'm wondering if it is possible to get false positives (or false negatives) as a result of this scheme.

That is, can I produce characters (previously lower cased using the invariant culture) that compare equal to each other using the latin-general-1 case insensitive SQL server collation, but do not compare equal to each other under the invariant culture?

If this can happen, then my app may consider 2 files that Windows thinks are different as being the same. This could ultimately lead to data loss.

NOTE:

I am aware that it is possible to have case sensitive files on Windows. I don't need to support those scenarios, however.


Solution

  • By looking through the answers to this question:

    win32-file-name-comparison

    which I asked a while back.,

    I found an indirect link the following page:

    http://msdn.microsoft.com/en-us/library/ms973919.aspx

    It suggests using an ordinal comparison after an invariant upper case as the best way to mimic what the file system does.

    So I think if I use as "case sensitive, accent sensitive" collation in the database, and do a "upper" using the invariant local before storing the files I should be ok.

    Does anyone know if there are any problems with that?