Search code examples
c#.netfilestreamdeflatestream

Reading using DeflateStream doesn't match expected size


I'm writing a set of discrete binary data to a stream and then to disk. I'm using a buffered file stream to reduce the disk usage.

BinaryWriter -> DeflateStream -> FileStream (buffered)

The data set consists of a header part (with some info) and some raw data, compressed;

1. Signature, 1 byte.
2. Timestamp, 8 bytes.
3. Size of data uncompressed, 8 bytes
4. Data (compressed, using DeflateStream), X bytes

The issue is that when reading the data, doing the inverse operation, the position on the stream doesn't match the expected value.

1. Read signature, 1 byte.
2. Read timestamp, 8 bytes (long).
3. Read data size, 8 bytes (long).
4. Read compressed data, using DeflateStream, (above value) bytes.

This of course breaks reading of all the other items. For a data of size 240_000, reading it results in reading more than that. Since I'm writing the data size right before the raw data, the operation to read the size back is working.

The issue is with the DeflateStream or maybe how I'm using it.

Writer

var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None, 200 * 1_048_576);
var binaryWriter = new BinaryWriter(fileStream);

var item = new Item
{
    Signature = (byte)1,
    TimeStamp = DateTime.Now.Ticks,
    Data = new byte[] { .... }
}

binaryWriter.Write(item.Signature); //1 byte.
binaryWriter.Write(item.TimeStamp); //8 bytes.
binaryWriter.Write(item.Data.LongLength); //8 bytes.

//Reported position: 17 (1 + 8 + 8)
//Data Length: 240_000

using (var compressStream = new DeflateStream(fileStream, CompressionLevel.Optimal, true))
{
    compressStream.Write(item.Data);
    compressStream.Flush();
}

//Reported position: 8099 (1 + 8 + 8 + [compressed length])

//🔁 Repeats until all items are in cache.

Reader

await using var fileStream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
using var binaryReader = new BinaryReader(fileStream);

var items = new List<Item>();

while (fileStream.Position < fileStream.Length)
{
    var item = new Item
    {
        StreamPosition = fileStream.Position
    };

    fileStream.Position += 1; //Skip signature.

    item.TimeStampInTicks = binaryReader.ReadInt64(); //🆗
    item.DataLength = binaryReader.ReadInt64(); //🆗, 240_000

    //Reported position: 17 (1 + 8 + 8) //🆗

    await using (var compressStream = new DeflateStream(fileStream, CompressionMode.Decompress, true))
    using (var compressBinaryReader = new BinaryReader(compressStream))
    {
        compressBinaryReader.ReadBytes((int)item.DataLength);
        //compressStream.ReadBytes((int)item.DataLength);
        //Same results without reader.
    }

    //Reported position: 8306 //📛
    //Expected position: 8099

    items.Add(item);
}

Working Code

I have to store the compressed size as well, and then reposition the stream manually, since the DeflateStream overshoots.

Writer

var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None, 200 * 1_048_576);
var binaryWriter = new BinaryWriter(fileStream);

var item = new Item
{
    Signature = (byte)1,
    TimeStamp = DateTime.Now.Ticks,
    Data = new byte[] { .... }
}

binaryWriter.Write(item.Signature); //1 byte.
binaryWriter.Write(item.TimeStamp); //8 bytes.
binaryWriter.Write(item.Data.LongLength); //8 bytes, uncompressed length.

var start = fileStream.Position;
binaryWriter.Write(0L); //8 bytes, compressed length.

using (var compressStream = new DeflateStream(fileStream, UserSettings.All.CaptureCompression, true))
{
    compressStream.Write(item.Data);
    compressStream.Flush();
}

var end = fileStream.Position;
var compressedLength = end - start - 8; //8 as the position was obtained before the size was written.

fileStream.Position = start;
binaryWriter.Write(compressedLength); //8 bytes, compressed length.
fileStream.Position = end;

Reader

await using var fileStream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
using var binaryReader = new BinaryReader(fileStream);

var items = new List<Item>();

while (fileStream.Position < fileStream.Length)
{
    var item = new Item
    {
        StreamPosition = fileStream.Position
    };

    fileStream.Position += 1; //Skip signature.

    item.TimeStampInTicks = binaryReader.ReadInt64();
    item.DataLength = binaryReader.ReadInt64();
    var compressedLength = binaryReader.ReadInt64();
    var currentPosition = fileStream.Position;
    
    await using (var compressStream = new DeflateStream(fileStream, CompressionMode.Decompress, true))
    {
        compressStream.ReadBytes((int)item.DataLength);
    }

    fileStream.Position = currentPosition + compressedLength;

    items.Add(item);
}

Solution

  • The problem here is that DeflateStream has internal buffer on its own. When you read data from DeflateStream - it reads data from underlying stream (in this case from your fileStream). It reads them to its own internal buffer, but it doesn't know beforehand how much bytes it should read (so where exactly data ends), and as such and it always tries to read the amount of bytes equal to that buffer length. This means that if your stream contains other bytes AFTER compressed data - it's perfectly fine for DeflateStream to overread past the compressed data while trying to fill its internal buffer on the last read. It will not make any use of those bytes, but it will read them and that will advance position on your FileStream past the compressed data.

    So overread does not indicate any bug in the deflate procedure, however you have to fix the position manually. And for that you need to know the size of compressed data.

    Side note - better not use BinaryReader to read the data in this case - use DeflateStream.Read (don't forget to check the return value of this method, indicating how much bytes were actually read).