Search code examples
c#spell-checkinghunspellnhunspell

How to display all mistaken words


I have some text in richTextBox1.

  1. I have to sort the words by their frequency and display them in richTextBox2. It seems to work.

  2. Have to find all mistaken words and display them in richTextBox4. I'm using Hunspell. Apparently I'm missing something. Almost all words are displayed in richTextBox4 not only the wrong ones.

Code:

foreach (Match match in wordPattern.Matches(str))
{
    if (!words.ContainsKey(match.Value))
        words.Add(match.Value, 1);
    else
        words[match.Value]++;
}

string[] words2 = new string[words.Keys.Count];
words.Keys.CopyTo(words2, 0);

int[] freqs = new int[words.Values.Count];
words.Values.CopyTo(freqs, 0);

Array.Sort(freqs, words2);
Array.Reverse(freqs);
Array.Reverse(words2);

Dictionary<string, int> dictByFreq = new Dictionary<string, int>();

for (int i = 0; i < freqs.Length; i++)
{
    dictByFreq.Add(words2[i], freqs[i]);
}

Hunspell hunspell = new Hunspell("en_US.aff", "en_US.dic");

StringBuilder resultSb = new StringBuilder(dictByFreq.Count); 

foreach (KeyValuePair<string, int> entry in dictByFreq)
{
    resultSb.AppendLine(string.Format("{0} [{1}]", entry.Key, entry.Value));
    richTextBox2.Text = resultSb.ToString();

    bool correct = hunspell.Spell(entry.Key);

    if (correct == false)                
    {
        richTextBox4.Text = resultSb.ToString();
    }    
}

Solution

  • In addition to the above answer (which should work if your Hunspell.Spell method works correctly), I have a few suggestions to shorten your code. You are adding Matches to your dictionary, and counting the number of occurrences of each match. Then you appear to be sorting them in descending value of the frequency (so the highest occurrence match will have index 0 in the result). Here are a few code snippets which should make your function a lot shorter:

    IOrderedEnumerable<KeyValuePair<string, int>> dictByFreq = words.OrderBy<KeyValuePair<string, int>, int>((KeyValuePair<string, int> kvp) =>  -kvp.Value);
    

    This uses the .NET framework to do all your work for you. words.OrderBy takes a Func argument which provides the value to sort on. The problem with using the default values for this function is it wants to sort on the keys and you want to sort on the values. This function call will do exactly that. It will also sort them in descending order based on the values, which is the frequency that a particular match occurred. It returns an IOrderedEnumerable object, which has to be stored. And since that is enumerable, you don't even have to put it back into a dictionary! If you really need to do other operations on it later, you can call the dictByFreq.ToList() function, which returns an object of type: List>.

    So your whole function then becomes this:

    foreach (Match match in wordPattern.Matches(str))
    {
        if (!words.ContainsKey(match.Value))
            words.Add(match.Value, 1);
        else
            words[match.Value]++;
    }
    
    IOrderedEnumerable<KeyValuePair<string, int>> dictByFreq = words.OrderBy<KeyValuePair<string, int>, int>((KeyValuePair<string, int> kvp) => -kvp.Value);
    
    Hunspell hunspell = new Hunspell("en_US.aff", "en_US.dic");
    
    StringBuilder resultSb = new StringBuilder(dictByFreq.Count);
    
    foreach (KeyValuePair<string, int> entry in dictByFreq)
    {
    
        resultSb.AppendLine(string.Format("{0} [{1}]", entry.Key, entry.Value));
        richTextBox2.Text = resultSb.ToString();
    
        bool correct = hunspell.Spell(entry.Key);
    
        if (correct == false)
        {
            richTextBox4.Text = entry.Key;
        }
    }