Let's say a file has non-English text. We can read the file contents with FileIO.ReadLinesAsync method. Now each line contains set of characters. How to extract each letter (non-English alphabet) from this string? Here i represented my question in C# code.
List<string> finalAlphabets = new List<string>();
IList<string> alphabetLines = await FileIO.ReadLinesAsync(_languageFile,UnicodeEncoding.Utf8);
if (alphabetLines.Count != 0)
{
foreach (string alphabetLine in alphabetLines)
{
//lets say alphabetLine has "కాకికు", here i want to extract each letter from this and i want to add to finalAlphabets list
finalAlphabets.Add("కా"); // How to extract this letter from alphabetLine variable. If you look at the Length of alphabetLine , it shows 6, but actually in Telugu language it is 3 letter word.
}
}
There is set of text information classes - TextInfo
, StringInfo
, and in particular you are likely looking for TextElementEnumerator
which lets one to find "text element" boundaries.
Simplified sample from MSDN article:
var myTEE = System.Globalization.StringInfo.GetTextElementEnumerator( "కాకికు");
while (myTEE.MoveNext()) {
Console.WriteLine( "[{0}]:\t{1}\t{2}",
myTEE.ElementIndex, myTEE.Current, myTEE.GetTextElement() );
}
Produces following output:
[0]: కా కా
[2]: కి కి
[4]: కు కు