Search code examples
c#linqcharacter-encodingglobalizationdiacritics

Get index of first non standard english character


I'm trying to process a string and separate it into two parts when i find a character that is not of the standard english alphabet. For example This is a stríng with áccents. and i need to know the index of the first or every character with accent (í).

I think the solution is somewhere between System.Text.Encoding and System.Globalization but i miss something...

The important thing is to know if it's a character with accent and if possible exclude space.

void Main()
{
    var str = "This is a stríng with áccents.";
    var strBeforeFirstAccent = str.Substring(0, getIndexOfFirstCharWithAccent(str));
    Console.WriteLine(strBeforeFirstAccent);

}

int getIndexOfFirstCharWithAccent(string str){
    //Process logic
    return 13;
}

Thanks!


Solution

  • The regex [^a-zA-Z ] will find characters other than non-accented Roman letters and spaces.

    So:

    var regex = new Regex("[^a-zA-Z ]");
    var match = regex.Match("This is a stríng with áccents.");
    

    will return í

    and match.Index will contain its location.