Suppose you want to read the data from large text file (~300mb) to array of vectors: vector<string> *Data
(assume that the number of columns is known).
//file is opened with ifstream; initial value of s is set up, etc...
Data = new vector<string>[col];
string u;
int i = 0;
do
{
istringstream iLine = istringstream(s);
i=0;
while(iLine >> u)
{
Data[i].push_back(u);
i++;
}
}
while(getline(file, s));
This code works fine for small files (<50mb) but memory usage is increasing exponentially when reading large file. I'm pretty sure that the problem is in creating istringstream
objects each time in a loop. However, defining istringstream iLine;
outside of both loops and putting each string into stream by iLine.str(s);
and clearing the stream after inner while-loop (iLine.str(""); iLine.clear();
) causes the same order of memory explosion as well.
The questions that arise:
istringstream
behaves this way;Thank you
EDIT: In regards to the 1st answer I do clean the memory allocated by array later in the code:
for(long i=0;i<col;i++)
Data[i].clear();
delete []Data;
FULL COMPILE-READY CODE (add headers):
int _tmain(int argc, _TCHAR* argv[])
{
ofstream testfile;
testfile.open("testdata.txt");
srand(time(NULL));
for(int i = 1; i<1000000; i++)
{
for(int j=1; j<100; j++)
{
testfile << rand()%100 << " ";
}
testfile << endl;
}
testfile.close();
vector<string> *Data;
clock_t begin = clock();
ifstream file("testdata.txt");
string s;
getline(file,s);
istringstream iss = istringstream(s);
string nums;
int col=0;
while(iss >> nums)
{
col++;
}
cout << "Columns #: " << col << endl;
Data = new vector<string>[col];
string u;
int i = 0;
do
{
istringstream iLine = istringstream(s);
i=0;
while(iLine >> u)
{
Data[i].push_back(u);
i++;
}
}
while(getline(file, s));
cout << "Rows #: " << Data[0].size() << endl;
for(long i=0;i<col;i++)
Data[i].clear();
delete []Data;
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed_secs << endl;
getchar();
return 0;
}
vector<>
grows memory geometrically. A typical pattern would be that it doubles the capacity whenever it needs to grow. That may leave a lot of extra space allocated but unused, if your loop ends right after such a threshold. You could try calling shrink_to_fit()
on each vector when you are done.
Additionally, memory allocated by the C++ allocators (or even plain malloc()
) is often not returned to the OS, but left in a process-internal free memory pool. this may lead to further apparent growth. And it may cause the results of shrink_to_fit()
to be invisible from outside the process.
Finally if you have lots of small strings ("2-digit numbers"), the overhead of a string
object may be considerable. Even if the implementation uses a small-string optimization, I'd assume that a typical string uses no less than 16 or 24 bytes (size, capacity, data pointer or small string buffer) - probably more on a platform where size_type
is 64 bits. That is a lot of memory for 3 bytes of payload.
So I assume you are seeing normal behaviour of vector<>