Search code examples
c#.netoptimizationmemory-management

How do I use Streams in .Net to unpack data without duplicating the data


I have this inherited code:

private string DecompressData(byte[] data)
{
    byte[] decompressedData;
    using (MemoryStream ms = new MemoryStream(data))
    {
        using (GZipStream gzip = new GZipStream(ms, CompressionMode.Decompress))
        {
            using (MemoryStream resultStream = new MemoryStream())
            {
                gzip.CopyTo(resultStream);
                decompressedData = resultStream.ToArray();
            }
        }
    }

    return Encoding.UTF8.GetString(decompressedData);
}

The data array can be up to 1Mb, and the zip ratio can get up to 32x its seems. However I am seeing big jumps in memory usage (around 100Mb sometimes) from this method. It is part of a container task that I am trying to parallelize as much as possible to get the most out of each container, but currently seeing OOM errors. How can I rewrite this code so that the input data is disposed of as the next stream reads it? Or what different approach can I take to do the same task in a more memory efficient way?


Solution

  • Could try something like this to see if it uses less memory:

    private string DecompressData(byte[] data)
    {
        using (MemoryStream ms = new MemoryStream(data))
        using (GZipStream gzip = new GZipStream(ms, CompressionMode.Decompress))
        using (StreamReader reader = new StreamReader(gzip, Encoding.UTF8))
        {
            // Read directly from the GZipStream and decode into a string
            return reader.ReadToEnd();
        }
    }
    

    This way you are not copying it into an intermediate memory stream.