C# Counting occurences of strings having emoji

I can achieve to count the occurences of a string by doing the following class / method :

private List<CountClass> CountCharacterOccurences(string theText)
{

    List<CountClass> theCountList = new();

    while (theText.Length > 0)
    {
        int cal = 0;
        for (int j = 0; j < theText.Length; j++)
            if (theText[0] == theText[j])
                cal++;

        theCountList.Add(new CountClass { Category = theText[0].ToString(), Count = cal });

        theText = theText.Replace(theText[0].ToString(), string.Empty);
    }

    return theCountList;
}

However, if my string contains Emojis, my logic does not work : it seems emojis are coded on 2 and/or more chars, so my "read the string by character" is wrong.

I'm able to identify / isolate in my string the emoji list using a RegEx, but this seems not useful.

Any help ? Thanks !

Solution

I assume you want to treat each grapheme cluster of the string as a separate character. A grapheme cluster is displayed as a single "unit" of text. In addition to single chars and surrogate pairs, this also includes things like emojis that are modified with skin tone modifiers, zero-width sequences, combining diacritics etc. This means that a "man with dark skin" emoji would be counted differently as a "man with light skin" emoji.

You can use StringInfo.GetTextElementEnumerator to iterate through the grapheme clusters:

using System.Globalization;

var dictionary = new Dictionary<string, int>();
var graphemeEnumerator = StringInfo.GetTextElementEnumerator("👨🏿👨🏿👨");
while(graphemeEnumerator.MoveNext()) {
    var grapheme = graphemeEnumerator.GetTextElement();
    if (dictionary.ContainsKey(grapheme)) {
        dictionary[grapheme]++;
    } else {
        dictionary.Add(grapheme, 1);
    }
}

// { [👨🏿, 2], [👨🏻, 1] }

You can then convert the dictionary into your CountClass if you want.

Note that Dmitry Bychenko's answer iterates over the runes (aka Unicode scalars) instead. For "👨🏿👨🏿👨", their answer will count 3 man emojis, 2 dark skin tone modifiers, and 1 light skin tone modifier.