Search code examples
c#performance.net-coresystem.io.pipelines

Why is this System.IO.Pipelines code much slower than Stream-based code?


I've written a little parsing program to compare the older System.IO.Stream and the newer System.IO.Pipelines in .NET Core. I'm expecting the pipelines code to be of equivalent speed or faster. However, it's about 40% slower.

The program is simple: it searches for a keyword in a 100Mb text file, and returns the line number of the keyword. Here is the Stream version:

public static async Task<int> GetLineNumberUsingStreamAsync(
    string file,
    string searchWord)
{
    using var fileStream = File.OpenRead(file);
    using var lines = new StreamReader(fileStream, bufferSize: 4096);

    int lineNumber = 1;
    // ReadLineAsync returns null on stream end, exiting the loop
    while (await lines.ReadLineAsync() is string line)
    {
        if (line.Contains(searchWord))
            return lineNumber;

        lineNumber++;
    }
    return -1;
}

I would expect the above stream code to be slower than the below pipelines code, because the stream code is encoding the bytes to a string in the StreamReader. The pipelines code avoids this by operating on bytes:

public static async Task<int> GetLineNumberUsingPipeAsync(string file, string searchWord)
{
    var searchBytes = Encoding.UTF8.GetBytes(searchWord);
    using var fileStream = File.OpenRead(file);
    var pipe = PipeReader.Create(fileStream, new StreamPipeReaderOptions(bufferSize: 4096));

    var lineNumber = 1;
    while (true)
    {
        var readResult = await pipe.ReadAsync().ConfigureAwait(false);
        var buffer = readResult.Buffer;

        if(TryFindBytesInBuffer(ref buffer, searchBytes, ref lineNumber))
        {
            return lineNumber;
        }

        pipe.AdvanceTo(buffer.End);

        if (readResult.IsCompleted) break;
    }

    await pipe.CompleteAsync();

    return -1;
}

Here are the associated helper methods:

/// <summary>
/// Look for `searchBytes` in `buffer`, incrementing the `lineNumber` every
/// time we find a new line.
/// </summary>
/// <returns>true if we found the searchBytes, false otherwise</returns>
static bool TryFindBytesInBuffer(
    ref ReadOnlySequence<byte> buffer,
    in ReadOnlySpan<byte> searchBytes,
    ref int lineNumber)
{
    var bufferReader = new SequenceReader<byte>(buffer);
    while (TryReadLine(ref bufferReader, out var line))
    {
        if (ContainsBytes(ref line, searchBytes))
            return true;

        lineNumber++;
    }
    return false;
}

static bool TryReadLine(
    ref SequenceReader<byte> bufferReader,
    out ReadOnlySequence<byte> line)
{
    var foundNewLine = bufferReader.TryReadTo(out line, (byte)'\n', advancePastDelimiter: true);
    if (!foundNewLine)
    {
        line = default;
        return false;
    }

    return true;
}

static bool ContainsBytes(
    ref ReadOnlySequence<byte> line,
    in ReadOnlySpan<byte> searchBytes)
{
    return new SequenceReader<byte>(line).TryReadTo(out var _, searchBytes);
}

I'm using SequenceReader<byte> above because my understanding is that it's more intelligent/faster than ReadOnlySequence<byte>; it has a fast path for when it can operate on a single Span<byte>.

Here are the benchmark results (.NET Core 3.1). Full code and BenchmarkDotNet results are available in this repo.

  • GetLineNumberWithStreamAsync - 435.6 ms while allocating 366.19 MB
  • GetLineNumberUsingPipeAsync - 619.8 ms while allocating 9.28 MB

Am I doing something wrong in the pipelines code?

Update: Evk has answered the question. After applying his fix, here are the new benchmark numbers:

  • GetLineNumberWithStreamAsync - 452.2 ms while allocating 366.19 MB
  • GetLineNumberWithPipeAsync - 203.8 ms while allocated 9.28 MB

Solution

  • I believe the reason is implementaiton of SequenceReader.TryReadTo. Here is the source code of this method. It uses pretty straightforward algorithm (read to the match of first byte, then check if all subsequent bytes after that match, if not - advance 1 byte forward and repeat), and note how there are quite some methods in this implementation called "slow" (IsNextSlow, TryReadToSlow and so on), so under at least certain circumstances and in certain cases it falls back to some slow path. It also has to deal with the fact sequence might contain multiple segments, and with maintaining the position.

    In your case you can avoid using SequenceReader specifically for searching the match (but leave it for actually reading lines), for example with this minor changes (this overload of TryReadTo is also more efficient in this case):

    private static bool TryReadLine(ref SequenceReader<byte> bufferReader, out ReadOnlySpan<byte> line) {
        // note that both `match` and `line` are now `ReadOnlySpan` and not `ReadOnlySequence`
        var foundNewLine = bufferReader.TryReadTo(out ReadOnlySpan<byte> match, (byte) '\n', advancePastDelimiter: true);
    
        if (!foundNewLine) {
            line = default;
            return false;
        }
    
        line = match;
        return true;
    }
    

    Then:

    private static bool ContainsBytes(ref ReadOnlySpan<byte> line, in ReadOnlySpan<byte> searchBytes) {
        // line is now `ReadOnlySpan` so we can use efficient `IndexOf` method
        return line.IndexOf(searchBytes) >= 0;
    }
    

    This will make your pipes code run faster than streams one.