Search code examples
c#deflatestream

Get all uncompressed bytes from a compressed file


I've created a method for returning all the uncompressed bytes from a compressed file.

    public static byte[] GetAllBytesFromCompressedFile(string fullPath)
    {
        const int blockSize = 10000;
        byte[] block = new byte[blockSize];
        List<byte> allBytes = new List<byte>(blockSize);

        int counter = 0;
        using (FileStream file = new FileStream(fullPath, FileMode.Open))
        {
            using (DeflateStream compress = new DeflateStream(file, CompressionMode.Decompress))
            {
                int bytesRead = 0;
                do
                {
                    bytesRead = compress.Read(block, 0, blockSize);
                    counter += bytesRead;
                    allBytes.AddRange(block);
                } while (bytesRead == blockSize);
            }
        }

        return allBytes.GetRange(0, counter).ToArray();
    }

It works fine, but it may be called several million times in a loop. Most of the files are rather small, but some can be up to about 100Mb, and I didn't want to preallocate 100Mb for all the small ones. So I have a few questions:

  1. First of all, is there already a method like this in the framework? Or a better way of doing this?
  2. Is there a way to get the uncompressed size of a compressed file? (then I wouldn't have to get blocks in a loop and could call Read once)
  3. I've used List<byte> so I don't have to manually reallocate a byte array. Is there a more efficient way of appending bytes?

I'll put my new code here even though it's probably not a hard problem for most people. But maybe someone spots something else that can be improved on, like explicitly setting the buffer size(?)

    public static byte[] GetAllBytesFromCompressedFile(string fullPath)
    {
        using (MemoryStream allBytes = new MemoryStream())
        {
            using (FileStream file = new FileStream(fullPath, FileMode.Open))
            {
                using (DeflateStream compress = new DeflateStream(file, CompressionMode.Decompress))
                {
                    compress.CopyTo(allBytes);
                }
            }

            return allBytes.ToArray();
        }
    }

Solution

  • First of all, is there already a method like this in the framework? Or a better way of doing this?

    Use a MemoryStream as the buffer and use Stream.Copy to copy the data in one line.

    Is there a way to get the uncompressed size of a compressed file?

    No, deflate is a streaming format. You can guess some value because the uncompressed data will likely be bigger then the compressed input. Likely a waste of time doing this.

    I've used List so I don't have to manually reallocate a byte array. Is there a more efficient way of appending bytes?

    This is horribly inefficient. The List class will enumerate the byte array you pass in and add the bytes one by one. Burns CPU like crazy on a big file. Use a MemoryStream. It uses memcpy to perform its copy operations.

    Also, you have a bug: You are not using the return value from Read to determine how many bytes were read. You are always appending one full buffer. That goes away with the suggested algorithm.