Search code examples
c#.net-core

How to write an int into System.Span, i.e the revers of int.Parse(span)?


I need to write and read big CSV (comma separated value) files, which basically contain integer values converted to strings. For reading such files efficiently, .Net Core has introduced a new Parse method for the type int:

public static int Parse (ReadOnlySpan<char> s,
  System.Globalization.NumberStyles style = 
  System.Globalization.NumberStyles.Integer, IFormatProvider provider = null);

This allows to use a StreamReader writing the characters of the file into a character array. My program has then to find the positions of the separation characters, creating a ReadOnlySpan containing the characters between 2 separators and then converting them into an int, without creating first a string out of these characters. Since my files contain millions of values, avoiding creating millions of strings should result in faster file reading. I hope.

But how about writing the int values as strings to the file ? Traditionally, it would be done like this:

var int1 = 1;
var int2 = 2;
streamWriter.WriteLine(int1.ToString() + "," + int2.ToString());

Again, for each int a string gets created and then another string for each line. This will create millions of strings that need to be garbage collected.

I would prefer something like that:

char[] charArray = getEmptyCharArray();
var span = new Span<char>(charArray);
int length1 = span.Write(int1);
charArray[length1] = ',';
span = span.Slice(length1 + 1);
int length2 = span.Write(int2);
streamWriter.Write(charArray, 0, length1 + 1 + length2);

getEmptyCharArray() provides a character array which gets reused.

Unfortunately, Span has no Write() function :-(

So the question is: How can I write an int (or DateTime or Decimal or ...) into a Span without generating any garbage collected objects (strings) ?

Note that any answer given before 2018 is probably not what is needed here, because System.Span got only introduced in .NET Core 2.1. Also note that the question here is about System.Span and not the HTML Span or any other Span.


Solution

  • Thanks to the comment from Ian Kent, I asked on https://gitter.im/dotnet/corefx and they knew the answer. It's embarrassingly simple:

    var i = 1;
    Span<char> span = new char[100];
    var ok = i.TryFormat(span, out var charsWritten);
    

    Since I didn't find this answer for some days and I wanted to move on with my code, I wrote my own method, but using char[] instead of Span. I measured with BenchmarkRunner the speed of the different methods to write a 50 megabyte CSV file with 7'000'000 ints:

    60 ms: Writing the same constant string. This gives a base line how long DotNet needs just to write the file

    for (int i = 0; i < iterations; i++) { streamWriter.WriteLine("1;12;123;1234;12345;123456;1234567;12345678;123;"); }

    610 ms: Using ToString()

    for (int i = 0; i < iterations; i++) { streamWriter.WriteLine($"{i};{i+1};{i+2};{i+3};{i+4};{i+5};{i+6};"); }

    308 ms: Using TryFormat(Span)

    185 ms: Using my own method and char[]

    It's amazing that the string conversations take 10 times longer than writing the actual file. I would have expected that the harddisk is much slower than any software.

    We are told that Span will solve many performance problems. Not by much. It seems it would have been better if they would use char[].

    Span test code

    public void WriteTo4() {
      var PathFileName = directoryInfo.FullName + @"\Test1.csv";
      using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
        using (var streamWriter = new StreamWriter(fileStream)) {
          var lineBuffer = new char[100];
          Span<char> span = lineBuffer;
          for (int i = 0; i < iterations; i++) {
            var ok = i.TryFormat(span, out var charsWritten);
            lineBuffer[charsWritten++] = ';';
            var span1 = span[charsWritten..];
            ok = (i+1).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
            span1 = span1[charsWritten..];
            ok = (i+2).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
            span1 = span1[charsWritten..];
            ok = (i+3).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
            span1 = span1[charsWritten..];
            ok = (i+4).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
            span1 = span1[charsWritten..];
            ok = (i+5).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
            span1 = span1[charsWritten..];
            ok = (i+6).TryFormat(span1, out charsWritten);
            span1[charsWritten++] = ';';
    
            var ca = lineBuffer[..(lineBuffer.Length - span1.Length + charsWritten)];
            streamWriter.WriteLine(lineBuffer, 0, lineBuffer.Length - span1.Length + charsWritten);
          }
        }
      }
    }
    

    Test code using char[]

    public void WriteTo3() {
      var PathFileName = directoryInfo.FullName + @"\Test1.csv";
      using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
        using (var streamWriter = new StreamWriter(fileStream)) {
          var lineBuffer = new char[100];
          for (int i = 0; i < iterations; i++) {
            var index = 0;
            lineBuffer.Write3(i, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+1, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+2, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+3, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+4, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+5, ref index);
            lineBuffer[index++] = ';';
            lineBuffer.Write3(i+6, ref index);
            lineBuffer[index++] = ';';
            streamWriter.WriteLine(lineBuffer, 0, index);
          }
        }
      }
    }
    
    
    public static void Write3(this char[] charArray, int i, ref int index) {
      if (i<0) {
        charArray[index++] = '-';
        i = -i;
      }
      int start = index;
    
      while (i>9) {
        charArray[index++] = (char)((i % 10) + '0');
        i /= 10;
      }
      charArray[index++] = (char)(i + '0');
      var end = index-1;
      while (end>start) {
        var temp = charArray[end];
        charArray[end--] = charArray[start];
        charArray[start++] = temp;
      }
    }