In .NET, I have a simple line that asserts a string must consist of letters or digits only
if (str.Any(c => !char.IsLetterOrDigit(c))
{
throw new Exception();
}
The problem is the string can be in any language and certain Unicode letters are represented by multiple characters. For example, in Turkish "i̇ş" contains "i" and combining dot above to represent the first letter.
As a result, the above validation fails. I don't want to hard-code invisible unicode chars to skip over. I'd like a generic solution that applies to all languages.
What is the fix for this?
You can skip the NonSpacingMark
category returned from char.GetUnicodeCategory(c)
. No need to hardcode anything.