Search code examples
c++cword-frequency

Word Frequency Statistics


In an pre-interview, I am faced with a question like this:

Given a string consists of words separated by a single white space, print out the words in descending order sorted by the number of times they appear in the string.

For example an input string of “a b b” would generate the following output:

b : 2
a : 1

Firstly, I'd say it is not so clear that whether the input string is made up of single-letter words or multiple-letter words. If the former is the case, it could be simple.

Here is my thought:

int c[26] = {0};
char *pIn = strIn;

while (*pIn != 0 && *pIn != ' ')
{
    ++c[*pIn];
    ++pIn;
}

/* how to sort the array c[26] and remember the original index? */

I can get the statistics of the frequecy of every single-letter word in the input string, and I can get it sorted (using QuickSort or whatever). But after the count array is sorted, how to get the single-letter word associated with the count so that I can print them out in pair later?

If the input string is made of of multiple-letter word, I plan to use a map<const char *, int> to track the frequency. But again, how to sort the map's key-value pair?

The question is in C or C++, and any suggestion is welcome.

Thanks!


Solution

  • I would use a std::map<std::string, int> to store the words and their counts. Then I would use something this to get the words:

    while(std::cin >> word) {
        // increment map's count for that word
    }
    

    finally, you just need to figure out how to print them in order of frequency, I'll leave that as an exercise for you.