I need to write and read big CSV (comma separated value) files, which basically contain integer values converted to strings. For reading such files efficiently, .Net Core has introduced a new Parse
method for the type int
:
public static int Parse (ReadOnlySpan<char> s,
System.Globalization.NumberStyles style =
System.Globalization.NumberStyles.Integer, IFormatProvider provider = null);
This allows to use a StreamReader
writing the characters of the file into a character array. My program has then to find the positions of the separation characters, creating a ReadOnlySpan
containing the characters between 2 separators and then converting them into an int
, without creating first a string
out of these characters. Since my files contain millions of values, avoiding creating millions of strings
should result in faster file reading. I hope.
But how about writing the int
values as strings
to the file ? Traditionally, it would be done like this:
var int1 = 1;
var int2 = 2;
streamWriter.WriteLine(int1.ToString() + "," + int2.ToString());
Again, for each int
a string
gets created and then another string
for each line. This will create millions of strings that need to be garbage collected.
I would prefer something like that:
char[] charArray = getEmptyCharArray();
var span = new Span<char>(charArray);
int length1 = span.Write(int1);
charArray[length1] = ',';
span = span.Slice(length1 + 1);
int length2 = span.Write(int2);
streamWriter.Write(charArray, 0, length1 + 1 + length2);
getEmptyCharArray()
provides a character array which gets reused.
Unfortunately, Span
has no Write()
function :-(
So the question is: How can I write an int
(or DateTime
or Decimal
or ...) into a Span
without generating any garbage collected objects (strings) ?
Note that any answer given before 2018 is probably not what is needed here, because System.Span
got only introduced in .NET Core 2.1. Also note that the question here is about System.Span
and not the HTML Span or any other Span.
Thanks to the comment from Ian Kent, I asked on https://gitter.im/dotnet/corefx and they knew the answer. It's embarrassingly simple:
var i = 1;
Span<char> span = new char[100];
var ok = i.TryFormat(span, out var charsWritten);
Since I didn't find this answer for some days and I wanted to move on with my code, I wrote my own method, but using char[] instead of Span. I measured with BenchmarkRunner the speed of the different methods to write a 50 megabyte CSV file with 7'000'000 ints:
60 ms: Writing the same constant string. This gives a base line how long DotNet needs just to write the file
for (int i = 0; i < iterations; i++) { streamWriter.WriteLine("1;12;123;1234;12345;123456;1234567;12345678;123;"); }
610 ms: Using ToString()
for (int i = 0; i < iterations; i++) { streamWriter.WriteLine($"{i};{i+1};{i+2};{i+3};{i+4};{i+5};{i+6};"); }
308 ms: Using TryFormat(Span)
185 ms: Using my own method and char[]
It's amazing that the string conversations take 10 times longer than writing the actual file. I would have expected that the harddisk is much slower than any software.
We are told that Span will solve many performance problems. Not by much. It seems it would have been better if they would use char[].
Span test code
public void WriteTo4() {
var PathFileName = directoryInfo.FullName + @"\Test1.csv";
using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
using (var streamWriter = new StreamWriter(fileStream)) {
var lineBuffer = new char[100];
Span<char> span = lineBuffer;
for (int i = 0; i < iterations; i++) {
var ok = i.TryFormat(span, out var charsWritten);
lineBuffer[charsWritten++] = ';';
var span1 = span[charsWritten..];
ok = (i+1).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+2).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+3).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+4).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+5).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+6).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
var ca = lineBuffer[..(lineBuffer.Length - span1.Length + charsWritten)];
streamWriter.WriteLine(lineBuffer, 0, lineBuffer.Length - span1.Length + charsWritten);
}
}
}
}
Test code using char[]
public void WriteTo3() {
var PathFileName = directoryInfo.FullName + @"\Test1.csv";
using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
using (var streamWriter = new StreamWriter(fileStream)) {
var lineBuffer = new char[100];
for (int i = 0; i < iterations; i++) {
var index = 0;
lineBuffer.Write3(i, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+1, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+2, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+3, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+4, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+5, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+6, ref index);
lineBuffer[index++] = ';';
streamWriter.WriteLine(lineBuffer, 0, index);
}
}
}
}
public static void Write3(this char[] charArray, int i, ref int index) {
if (i<0) {
charArray[index++] = '-';
i = -i;
}
int start = index;
while (i>9) {
charArray[index++] = (char)((i % 10) + '0');
i /= 10;
}
charArray[index++] = (char)(i + '0');
var end = index-1;
while (end>start) {
var temp = charArray[end];
charArray[end--] = charArray[start];
charArray[start++] = temp;
}
}