Search code examples
c#streamcompressiongzipdeflate

GZIP File corrupted - but why?


I am curently working on a GZIP HTTP decompression.

My server receives some data and im cropping and saving it in binary mode. I've made a little script to download the gzip from stackoverflow and saved it to a .gz file. Works fine!

But the "gzip" I receive from my fortigate-firewall ends up being corrupted.

Corrupted and working file here: https://gofile.io/d/j520Nr

The buffer is the corrupted file - and im not sure why. Both files are extremely different (at least how I see it) - but the GZIP header is definitely present!

Can someone maybe compare these two files and tell me why they are that different? Or maybe even show me how to fix it?

Thats the gzip html url for both of the files: What is the best way to parse html in C#?

My corrupted file is around 2KB larger!

I would be happy for every step in the right direction - maybe it is something that can be fixed really easy!

The following code should show you my workflow, "ReadAll" is pretty slow but reads all from the stream. It will be optimized ofc (maybe its the problem of the wrong gzip stream?)

    public static byte[] ReadAll(NetworkStream stream, int buffer)
    {
        byte[] data = new byte[buffer];
        using MemoryStream ms = new MemoryStream();
        int numBytesRead;
        while ((numBytesRead = stream.Read(data, 0, data.Length)) > 0)
        {
            ms.Write(data, 0, numBytesRead);
        }
        return ms.ToArray();
    }

    private bool Handled = false;

    /// <summary>
    /// Handles Client and passes matches to the parser for more investigation
    /// </summary>
    /// <param name="obj"></param>
    private void HandleClient(object obj)
    {
        TcpClient client = (TcpClient)obj;
        Out.Log(LogLevel.Verbose, $"Client {client.Client.RemoteEndPoint} connected");
        Data = null; // Resets data after each received stream
        // Get a stream object for reading and writing
        NetworkStream stream = client.GetStream();
        //MemoryStream memory = new MemoryStream();

        // Wait to receive all the data sent by the client.
        if (stream.CanRead)
        {

            Out.Log(LogLevel.Debug, "Can read stream");
            StringBuilder c_completeMessage = new StringBuilder();

            if (!Handled)
            {
                Out.Log(LogLevel.Warning, "Handling first and last client.");
                Handled = true;
                int breakPoint = 0;
                byte[] res = ReadAll(stream, 1024);
                for (int i = 0; i < res.Length; i++)
                {
                    int xy = res[i];
                    int yy = res[i + 1];
                    if (res[i].Equals(31) && res[i + 1].Equals(139))
                    {
                        breakPoint = i;
                        Out.Log(LogLevel.Error, GZIP_MAGIC + $" found. Magic Number of GZIP at :{breakPoint}:");
                        break;
                    }
                    continue;
                }

                byte[] res2 = res.SubArray(breakPoint, res.Length - breakPoint - 7); // (7 for offset linebreaks, eol, etc)
                res2.WriteToFile(@"C:\Users\--\Temporary\Buffer_ReadFully_cropped.gz");

Solution

  • As mentioned before, chunking and buffer size played a big role here.

    Remember, ICAP uses chunking so you have to respond to the previous package with a CONTINUE, otherwise you will just receive the first X bytes from the server.