I'm working on a project which generates a very large number of sequential text strings, in a very tight loop. My application makes heavy use of SIMD instruction set extensions like SSE and MMX, in other parts of the program, but the key generator is plain C++.
The way my key generator works is I have a keyGenerator class, which holds a single char array that stores the current key. To get the next key, there is a function called "incrementKey," which treats the string as a number, adding one to the string, carrying where necessary.
Now, the problem is, the keygen is somewhat of a bottleneck. It's fast, but it would be nice if it were faster. One of the biggest problems is that when I'm generating a set of sequential keys to be processed using my SSE2 code, I have to have the entire set stored in an array, which means I have to sequentially generate and copy 12 strings into an array, one by one, like so:
char* keys[12];
for(int i = 0; i < 12; i++)
keys[i] = new char[16];
strcpy(keys[i], keygen++);
So how would you efficiently generate these plaintext strings in order? I need some ideas to help move this along. Concurrency would be nice; as my code is right now, each successive key depends on the previous one, which means that the processor can't start work on the next key until the current one has been completely generated.
Here is the code relevant to the key generator:
class keyGenerator
keyGenerator(unsigned long long location, characterSet* charset)
: location(location), charset(charset)
for(int i = 0; i < 16; i++)
key[i] = 0;
charsetStr = charset->getCharsetStr();
inline void incrementKey()
register size_t keyLength = strlen(key);
for(register char* place = key; place; place++)
if(*place == charset->maxChar)
// Overflow, reset char at place
*place = charset->minChar;
// Carry, no space, insert char
*(place+1) = charset->minChar;
// Space available, increment char at place
if(*place == charset->charSecEnd[0]) *place = charset->charSecBegin[0];
else if(*place == charset->charSecEnd[1]) *place = charset->charSecBegin[1];
inline char* operator++() // Pre-increment
return key;
inline char* operator++(int) // Post-increment
memcpy(postIncrementRetval, key, 16);
return postIncrementRetval;
void integerToKey()
register unsigned long long num = location;
key[0] = charsetStr[0];
unsigned int remainder = num % charset->length;
num /= charset->length;
key[strlen(key)] = charsetStr[remainder];
inline unsigned long long keyToInteger()
return 0;
inline char* getKey()
return key;
unsigned long long location;
characterSet* charset;
std::string charsetStr;
char key[16];
// We need a place to store the key for the post increment operation.
char postIncrementRetval[16];
struct characterSet
characterSet(unsigned int len, int min, int max, int charsec0, int charsec1, int charsec2, int charsec3)
init(length, min, max, charsec0, charsec1, charsec2, charsec3);
void init(unsigned int len, int min, int max, int charsec0, int charsec1, int charsec2, int charsec3)
length = len;
minChar = min;
maxChar = max;
charSecEnd[0] = charsec0;
charSecBegin[0] = charsec1;
charSecEnd[1] = charsec2;
charSecBegin[1] = charsec3;
std::string getCharsetStr()
std::string retval;
for(int chr = minChar; chr != maxChar; chr++)
for(int i = 0; i < 2; i++) if(chr == charSecEnd[i]) chr = charSecBegin[i];
retval += chr;
return retval;
int minChar, maxChar;
// charSec = character set section
int charSecEnd[2], charSecBegin[2];
unsigned int length;
Well.. Performance-wise, all the new/strcpy/strmp's are probably hurting you much more than your keygen.
Allocate your memory in a larger pool at a time and then use pointers within it.
With keygen, you should avoid sticking to the leaking abstraction of single key produced, produce instead the optimal amount at a time. Possibly larger multiples.
On certain intervals you can actually use SSE/MMX to produce the keys, at least when string is aligned and is divisible by your SSE/MMX word length. You could also try to pad it with 0's and then shift them away if the string is not. It's probably not really worth the effort if you only generate 16 at a time.