Search code examples
c#gzipstream

Can I get a GZipStream for a file without writing to intermediate temporary storage?


Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).

    public void UploadFile(string filename)
    {
        using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
        {   
            using (var fileStream = File.OpenRead(filename))
            using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
            {
                fileStream.CopyTo(compressedStream);
            }

            temporaryFileStream.Position = 0;

            Uploader.Upload(temporaryFileStream);
        }
    }

What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?

Note that Upload is a static method with signature static void Upload(Stream stream).

Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.


Solution

  • Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.

    It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.

    Later: Here's an example that should come close to what you're asking for.

    using (var pcStream = new ProducerConsumerStream(BufferSize))
    {
        // start upload in a thread
        var uploadThread = new Thread(UploadThreadProc(pcStream));
        uploadThread.Start();
    
        // Open the input file and attach the gzip stream to the pcStream
        using (var inputFile = File.OpenRead("inputFilename"))
        {
            // create gzip stream
            using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
            {
                var bytesRead = 0;
                var buff = new byte[65536]; // 64K buffer
                while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
                {
                    gz.Write(buff, 0, bytesRead);
                }
            }
        }
        // The entire file has been compressed and copied to the buffer.
        // Mark the stream as "input complete".
        pcStream.CompleteAdding();
    
        // wait for the upload thread to complete.
        uploadThread.Join();
    
        // It's very important that you don't close the pcStream before
        // the uploader is done!
    }
    

    The upload thread should be pretty simple:

    void UploadThreadProc(object state)
    {
        var pcStream = (ProducerConsumerStream)state;
        Uploader.Upload(pcStream);
    }
    

    You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.