I need to calculate how many times every character occurs in the given string. I need to do it on C or C++, I can use any library. The problem is that I am not a C/C++ developer, so I am not sure that my code is optimal. I want to get the best performance algorithm, it is the main reason for this question.
I am using the following code at the moment:
using namespace std;
...
char* text; // some text, may be very long
int text_length; // I know this value, if it can help
map<char,int> table;
map<char,int>::iterator it;
for(int i = 0; c = text[i]; i++) {
it = table.find(c);
if (it2 == table.end()) {
table[c] = 1;
} else {
table[c]++;
}
}
I may use any other structure except std::map, but I do not know which structure is better.
Thanks for your help!
You are doing it right using bucket sort. There cannot be a faster (non-parallel) algorithm for counting elements in a finite universe (such as characters).
If you only use ASCII characters, you can use a simple array int table[256]
to avoid the overhead of C++ containers.
Using Duff's device (which is actually slower on some CPUs nowadays):
int table[256];
memset(table, 0, sizeof(table));
int iterations = (text_length+7) / 8;
switch(count % 8){
case 0: do { table[ *(text++) ]++;
case 7: table[ *(text++) ]++;
case 6: table[ *(text++) ]++;
case 5: table[ *(text++) ]++;
case 4: table[ *(text++) ]++;
case 3: table[ *(text++) ]++;
case 2: table[ *(text++) ]++;
case 1: table[ *(text++) ]++;
} while(--iterations > 0);
}
Update: As MRAB remarked, processing text chunks in parallel might give you a perfomance boost. But be aware that creating a thread is quite expensive, so you should measure, what the lowest amount of characters is, which justifies the thread creation time.