Currently I have code, which based on a CultureInfo cultureInfo = new CultureInfo("ja-JP")
does a search using
bool found = cultureInfo.CompareInfo.IndexOf(x, y,
CompareOptions.IgnoreCase |
CompareOptions.IgnoreKanaType |
CompareOptions.IgnoreWidth
) >= 0;
As doing a x.IndexOf(y)
is way faster, and my x
es are plenty and rarely change, I'd like to canonicalize the x
es once, and when performing the search do a simple
canonicalizedX.indexOf(canonicalize(y));
My question: Is there anything in the .net libraries which I could use do implement the canonicalize()
function, using my CultureInfo
and CompareOptions
?
I ended up using LCMapStringEx
and it works fine for me. It is not based upon (an arbitrary set of) CompareOptions
, but the CompareInfo.GetSortKey docs lead me to LCMapString
, so the effect of my indexOf
of canonicalized strings should be yield the same result as CultureInfo.CompareInfo.IndexOf
, using the hardcoded CompareOptions
, here called dwMapFlags
:
public static string Canonicalize(string src)
{
string localeName = "ja-JP";
string nResult = src;
int nLen, nSize;
uint dwMapFlags = LCMAP_LOWERCASE | LCMAP_HIRAGANA | LCMAP_FULLWIDTH;
IntPtr ptr, pZero = IntPtr.Zero;
nLen = src.Length;
nSize = LCMapStringEx(localeName, dwMapFlags, src, nLen, IntPtr.Zero, 0, pZero, pZero, pZero);
if (nSize > 0)
{
nSize = nSize * sizeof(char);
ptr = Marshal.AllocHGlobal(nSize);
try
{
nSize = LCMapStringEx(localeName, dwMapFlags, src, nLen, ptr, nSize, pZero, pZero, pZero);
if (nSize > 0) nResult = Marshal.PtrToStringUni(ptr, nSize);
}
finally
{
Marshal.FreeHGlobal(ptr);
}
}
return nResult;
}
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
static extern int LCMapStringEx(
string lpLocaleName,
uint dwMapFlags,
string lpSrcStr,
int cchSrc,
[Out]
IntPtr lpDestStr,
int cchDest,
IntPtr lpVersionInformation,
IntPtr lpReserved,
IntPtr sortHandle);
private const uint LCMAP_LOWERCASE = 0x100;
private const uint LCMAP_UPPERCASE = 0x200;
private const uint LCMAP_SORTKEY = 0x400;
private const uint LCMAP_BYTEREV = 0x800;
private const uint LCMAP_HIRAGANA = 0x100000;
private const uint LCMAP_KATAKANA = 0x200000;
private const uint LCMAP_HALFWIDTH = 0x400000;
private const uint LCMAP_FULLWIDTH = 0x800000;
I also tried Microsoft.VisualBasic.StrConv, which works, but is twice as slow as pinvoking LCMapStringEx
.