I'm importing a large CSV file into GNU Octave, doing some simple data manipulation and creating some plots. The file has about 6.5 million rows. I expected the process of file reading to take about two to three hours, because that's how long it usually takes to create a file this size in my experience. Added a status counter when it wasn't finishing and found that it was slowing down as it read; after 12 hours, only at line 1.5 million and moving at a crawl. According to Resource Monitor, though, no memory issues. Is there a more efficient way to read the code than what I have below? Do I need to do something special to allocate memory to the process so it doesn't slow down? This is the loop that's reading in the CSV. It's a while loop that scans the csv one line at a time, extracts the columns I need and ends when it reaches the first blank line:
% Process File
F=1;
while 1
% Status Counter
printf ("Status: %d \r", F);
fflush (stdout);
F=F+1;
% Read first unread line
line = fgetl(fileID);
% Exit while loop if line is empty
if ~ischar(line)
break;
endif
% Translate Line
Bank = textscan (line, '%f', 'Delimiter', ',');
Bank = cell2mat (Bank);
Bank = transpose (Bank);
% Append Bank to Output
Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];
endwhile
This is the slow part:
Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];
What you do here is create a new matrix, copy Output
and the new row into it, and assign it to Output
. As Output
becomes larger, the copy becomes increasingly expensive.
What you need to do is preallocate the output array. Always preallocate!