Search code examples
c#.netvalidationunicodechar

How to validate Unicode letters in .NET that contain invisible chars


In .NET, I have a simple line that asserts a string must consist of letters or digits only

if (str.Any(c => !char.IsLetterOrDigit(c))
{
  throw new Exception();
}

The problem is the string can be in any language and certain Unicode letters are represented by multiple characters. For example, in Turkish "i̇ş" contains "i" and combining dot above to represent the first letter.

As a result, the above validation fails. I don't want to hard-code invisible unicode chars to skip over. I'd like a generic solution that applies to all languages.

What is the fix for this?


Solution

  • You can skip the NonSpacingMark category returned from char.GetUnicodeCategory(c). No need to hardcode anything.