Question: What is the best way to parse files that are missing the new line character at the end of the file? Should I just use a try and catch on OutOfMemoryException? Or, is there a better way?
Background: I am parsing log files using StreamReader's Readline() method to read in the next line. So, the basic loop structure looks like this:
while ((line = sr.ReadLine()) != null)
{
// Parse the file
}
This works well, even on large files (i.e., > 2GB). But, when the next line is not null and does not contain a new line character then StreamReader just reads blank spaces until all memory is consumed and an OutOfMemoryException is thrown. Is this the best way to handle a missing new line character at the end of the file? Or, are there better ways of handling this problem?
Note: the file is being created from IIS Exchange Server. Without digging in with our IT group, the file appears to be cutoff mid-creation, resulting in the last row being bad as it is missing data.
Research: I found a posting on SO (see below) that refers to using File.ReadFile
. While it works on a much smaller file (i.e., < 2GB) that is missing the new line character, it still fails on large files (i.e., > 2GB).
https://stackoverflow.com/a/13416225
Edit
The compiler stops at the While line in the code sample below. The problem is not with the code, but with the file. I cannot post our log files. But, to demonstrate, create a few rows of data in NotePad++. For the last row of the file, remove the NewLine character and then run the file. StreamReader will blow up on the last row because it cannot find the end of the row.
Below is a copy of the log file with all data contents removed, with exception to the timestamp and the NewLine character at the end of each row. For the last row, I included the last data element (port number) before the data cuts off. Notice that the last row is missing the new line character?
I have confirmed the file was bad it our IT group. What happened is that the original transfer process over the network to my local seems to have experienced a hiccup. I re-transferred the file and it parsed successfully. There are also more rows. What threw me off of this was that the file sizes between the network and my local were identical - so I did not consider re-transmitting the file during my research efforts.
The file transfer process seems to first be allocating a full file as empty and then starts filling it with data. Good luck diagnosing extremely large files that cannot be opened by standard text editors (e.g., Notepad, Notepadd++, Excel, etc.) to see this. I had to use Ultra Edit and the problem became visible.
Per Hans Passant's comment on a related question (see link below), StreamReader's Readline() method will handle large files just fine as it handles the file-system caching internally. So, OutOfMemoryExceptions should not be a problem. I assume this was aimed at computers with insufficient memory as opposed to bad files.
Thank you all for the troubleshooting and my apologies for any interruption.