Given that RAM is much faster than a hard drive, I was surprised by the code below.
I was trying to split a CSV files based on the value of one column, and write each line with different values in that cell to different files.
I was trying:
List<string> protocolTypes = new List<string>();
List<string> splitByProtocol = new List<string>();
foreach (string s in lineSplit)
{
string protocol = getProtocol();
index = protocolTypes.IndexOf(protocol);
splitByProtocol[index] = splitByProtocol[index] + s + "\n";
}
Which took ages, but changing it to a stream writer was much faster:
List<string> protocolTypes = new List<string>();
List<StreamWriter> splitByProtocol = new List<StreamWriter>();
foreach (string s in lineSplit)
{
string protocol = getProtocol();
index = protocolTypes.IndexOf(protocol);
splitByProtocol[index].WriteLine(s);
}
Why is writing to disk so much faster than appending strings together in memory? I know adding to a string requires copying the whole string to a new memory location, but appending a string was orders of magnitude slower than writing to disk which seems counter intuitive.
If the strings become huge (many MB) then copying them definitely becomes time-consuming.
However the biggest hit may be caused by the many old strings that are no longer needed, sitting as garbage on the heap, waiting to be collected. So the garbage collector will kick in, possibly even many times, pausing your program every time.
For strings constructed in a loop like this, always consider using StringBuilder
instead. To match your example code:
List<StringBuilder> splitByProtocol = new List<StringBuilder>();
foreach (string s in lineSplit)
{
string protocol = getProtocol();
index = protocolTypes.IndexOf(protocol);
splitByProtocol[index].AppendLine(s);
}