Search code examples
c#.netasync-awaitdeflatestream

DeflateStream.ReadAsync (.NET 4.5 System.IO.Compression) has different return value of bytes read than equivalent Read method?


While converting some older code to use async in c#, I started seeing problems in variations of return values from the Read() and ReadAsync() methods of the DeflateStream.

I thought that the transition from synchronous code like

bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);

to it's equivalent asynchronous version of

bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);

should always return the same value.


See updated code added to bottom of question - that uses streams the correct way - hence making the initial question irrelevant


I found that after number of iterations this didn't hold true, and in my specific case was causing random errors in the converted application.

Am I missing something here?

Below is simple repro case (in a console app), where the Assert will break for me in the ReadAsync method on iteration #412, giving output that looks like this:

....
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync #412 - 453 bytes read
---- DEBUG ASSERTION FAILED ----

My question is, why is the DeflateStream.ReadAsync method returning 453 bytes at this point?

Note: this only happens with certain input strings - the massive StringBuilder stuff in the CreateProblemDataString was the best way I could think of constructing the string for this post.

class Program
{
    static byte[] DataAsByteArray;
    static int uncompressedSize;

    static void Main(string[] args)
    {
        string problemDataString = CreateProblemDataString();
        DataAsByteArray = Encoding.ASCII.GetBytes(problemDataString);
        uncompressedSize = DataAsByteArray.Length;
        MemoryStream memoryStream = new MemoryStream();
        using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress, true))
        {
            for (int i = 0; i < 1000; i++)
            {
                deflateStream.Write(DataAsByteArray, 0, uncompressedSize);
            }
        }

        // now read it back synchronously
        Read(memoryStream);

        // now read it back asynchronously
        Task retval = ReadAsync(memoryStream);
        retval.Wait();
    }

    static void Read(MemoryStream memoryStream)
    {
        memoryStream.Position = 0;
        using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
        {
            byte[] buffer = new byte[uncompressedSize];
            int bytesRead = -1;
            int i = 0;
            while (bytesRead > 0 || bytesRead == -1)
            {
                bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);
                System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
                System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
                i++;
            }
        }
    }

    static async Task ReadAsync(MemoryStream memoryStream)
    {
        memoryStream.Position = 0;
        using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
        {
            byte[] buffer = new byte[uncompressedSize];
            int bytesRead = -1;
            int i = 0;
            while (bytesRead > 0 || bytesRead == -1)
            {
                bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);
                System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
                System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
                i++;
            }
        }
    }

    /// <summary>
    /// This is one of the strings of data that was causing issues. 
    /// </summary>
    /// <returns></returns>
    static string CreateProblemDataString()
    {
        StringBuilder sb = new StringBuilder();
        sb.Append("0601051081                                      ");
        sb.Append("                                                                       ");
        sb.Append("                         225021         0300420");
        sb.Append("34056064070072076361102   13115016017");
        sb.Append("5      192         230237260250   2722");
        sb.Append("73280296      326329332   34535535");
        sb.Append("7   3                                                                  ");
        sb.Append("                                                                    4");
        sb.Append("                                                                             ");
        sb.Append("                                                         50");
        sb.Append("6020009      030034045   063071076   360102   13");
        sb.Append("1152176160170   208206      23023726025825027227328");
        sb.Append("2283285   320321333335341355357   622005009      0");
        sb.Append("34053      060070      361096   130151176174178172208");
        sb.Append("210198   235237257258256275276280290293   3293");
        sb.Append("30334   344348350                                                     ");
        sb.Append("                                                         ");
        sb.Append("                                           ");
        sb.Append("                                                                                   ");
        sb.Append("                                     225020012014   046042044034061");
        sb.Append("075078   361098   131152176160170   208195210   230");
        sb.Append("231260257258271272283306      331332336   3443483");
        sb.Append("54    29                                                           ");
        sb.Append("                                                                      ");
        sb.Append("                                                   2");
        sb.Append("5      29                                                06      0");
        sb.Append("1                                                            178      17");
        sb.Append("4                                                   205                     2");
        sb.Append("05      195                                                   2");
        sb.Append("31                     231      23");
        sb.Append("7                                       01              01    0");
        sb.Append("2                                              260                     26");
        sb.Append("2                                                            274                     2");
        sb.Append("72      274                                       01              01    0");
        sb.Append("3           1   5      3 6     43 52    ");
        return sb.ToString();
    }
}

UPDATED CODE TO READ STREAMS INTO BUFFER CORRECTLY

Output now looks like this:

...
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync PARTIAL #412 - 453 bytes read, offset for next read = 453
ReadAsync #412 - 1602 bytes read
ReadAsync #413 - 2055 bytes read
...


static void Read(MemoryStream memoryStream)
    {
        memoryStream.Position = 0;
        using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
        {
            byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
            int bytesRead; // number of bytes read from Read operation
            int offset = 0; // offset for writing into buffer
            int i = -1; // counter to track iteration #
            while ((bytesRead = deflateStream.Read(buffer, offset, uncompressedSize - offset)) > 0)
            {
                offset += bytesRead;  // offset in buffer for results of next reading
                System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
                if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
                {
                    offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
                    i++; // increment counter that tracks iteration #
                    System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
                }
                else // buffer still not full
                {
                    System.Diagnostics.Debug.WriteLine("Read PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
                }
            }
        }
    }

    static async Task ReadAsync(MemoryStream memoryStream)
    {
        memoryStream.Position = 0;
        using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
        {
            byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
            int bytesRead; // number of bytes read from Read operation
            int offset = 0; // offset for writing into buffer
            int i = -1; // counter to track iteration #
            while ((bytesRead = await deflateStream.ReadAsync(buffer, offset, uncompressedSize - offset)) > 0)
            {
                offset += bytesRead;  // offset in buffer for results of next reading
                System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
                if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
                {
                    offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
                    i++; // increment counter that tracks iteration #
                    System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
                }
                else // buffer still not full
                {
                    System.Diagnostics.Debug.WriteLine("ReadAsync PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
                }
            }
        }
    }

Solution

  • Damien's comments are exactly correct. But, your mistake is a common enough one and IMHO the question deserves an actual answer, if for no other reason than to help others who make the same mistake more easily find the answer to the question.

    So, to be clear:

    As is true for all of the stream-oriented I/O methods in .NET where one provides a byte[] buffer and the number of bytes read is returned by the method, the only assumptions you can make about the number of bytes are:

    1. The number will not be larger than the maximum number of bytes you asked to read (i.e. passed to the method as the count of bytes to read)
    2. The number will be non-negative, and will be greater than 0 as long as there were in fact data remaining to be read (0 will be returned when you reach the end of the stream).

    When reading using any of these methods, you cannot even count on the same method always returning the same number of bytes (depending on context…obviously in some cases, this is in fact deterministic, but you should still not rely on that), and there is no guarantee of any sort that different methods, even those which are reading from the same source, will always return the same number of bytes as some other method.

    It is up to the caller to read the bytes as a stream, taking into account the return value specifying the number of bytes read for each call, and reassembling those bytes in whatever manner is appropriate for that particular stream of bytes.

    Note that when dealing with Stream objects, you can use the Stream.CopyTo() method. Of course, it only copies to another Stream object. But in many cases, the destination object can be used without treating it as a Stream. E.g. you just want to write the data as a file, or you want to copy it to a MemoryStream and then use the MemoryStream.ToArray() method to turn that into an array of bytes (which you can then access without any concern about how many bytes have been read in a given read operation…by the time you get to the array, all of them have been read :) ).