I have a tab delimited file with 8,000,000+ rows that have some rogue tabs.
For example:
a->b->c->d
a->b->c->-->-->--d
a->b->c->d
a->b->c->d
I have a method to rectify the rogue tabs (3 tabs to 1 tab) as follows:
string text = File.ReadAllText(filePath);
text = text.Replace("\t\t\t", "\t");
File.WriteAllText(filePath, text);
The above code block produces the following error:
An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
How can I read and and write just one row at a time so that the whole file is not in memory?
File.ReadLines
gives you a lazy IEnumerable<string>
. You can iterate over that instead and only load one line at a time.
You'll need to write to a different file than you read from, though. You can delete/rename when you finish.
Here's a one-liner that processes the file:
File.WriteAllLines(outputFile,
File.ReadLines(inputFile).
Select(t => t.Replace("\t\t\t", "\t"))
);