I have automatically generated a huge, but very simple .cpp file. It defines a class:
#include <QString>
#include <map>
class CTrigramFrequencyTable_English
{
public:
CTrigramFrequencyTable_English();
private:
std::map<QString /*trigram*/, quint64 /*count*/> _trigramFrequencyTable;
const quint64 _totalTrigramCount;
};
and puts 10k lines of the following kind in the constructor:
_trigramFrequencyTable[QString("and")] = 48760ull;
I have started compiling this .cpp about 10 minutes ago, and it's still ongoing. Is there any way to achieve what I want and reduce compilation time? Why is it even taking so long? I've seen quite a few libraries with 3k-5k lines of regular code, even with templates, and it compiled very fast.
Bottom line - I don't want to put my data into a resource file and parse this file, I wanted to compile the data directly into the binary.
P. S. 10k lines file compiles in about 30 seconds in debug configuration; in release I waited for 10 minutes and terminated the process.
By experience (in MELT, with recent GCC -e.g. 4.8 or 4.9) with generated C++ (sort of C like) code, the compilation time of a routine is quadratic in size (in number of lines) of that routine as soon as you want the compiler to optimize.
Register allocation and instruction scheduling algorithms inside any optimizing compiler are hard and complex!
In your particular case, you should consider changing your C++ code generating script to emit something like:
struct my_trigram_pair_st {
const char*name;
unsigned long long freq;
};
const struct my_trigram_pair_st my_trigrams[]= {
{ "and", 48760ull },
// zillions of similar lines
{ NULL, NULL }
};
and preferably, emit that as C (not C++) code. It can be C code, since const char*
is a plain C-string (for literal strings like "and"
), and the freq
is a plain number. Change also your generator to emit legal C99 strings (so don't emit Ô
inside, but \303\224
or preferably \xc3\x94
...)
Then, adjust your C++ program to use that:
extern "C" const struct my_trigram_pair_st my_trigrams[];
for (int i=0; my_trigrams[i].name != nullptr; i++)
_trigramFrequencyTable[QString(my_trigrams[i].name)]
= my_trigrams[i].freq;
Here you are converting UTF8 const char*
to QString
-s at run time.
If you need your script to generate functions, make your script split these functions into smaller functions (of e.g. at most a thousand lines each).
Alternatively put your huge data in e.g. some Sqlite and/or Json file.... (you could even have some Sqlite file with JSON inside).
You could also disable optimizations in your compiler when compiling that particular file.... Or you could wait much longer (hours).