Search code examples
c#httptcphttpwebrequesttcpclient

Server sending invalid gzip to TcpClient


I'm trying to learn more about how web and tcp work by implementing web tcp client.

Currently, my web request function looks like this:

    public string SendWebRequest(SocketWebRequest request)
    {
        using (NetworkStream ns = tc.GetStream())
        {
                using (System.IO.StreamReader sr = new System.IO.StreamReader(ns))
                {
                    request.WriteTo(ns);
                    ns.Flush();

                    var statusLine = sr.ReadLine();
                    ProcessStatusLine(statusLine);

                    Headers = ReadHeaders(sr);

                    ProcessCookies(request.Host);

                    int contentLength = 0;
                    if (Headers.ContainsKey("Content-Length"))
                    {
                        foreach (var cl in Headers["Content-Length"])
                        {
                            int buf;
                            if (int.TryParse(cl,out buf))
                            {
                                contentLength = buf;
                                break;
                            }
                        }
                    }
                    if (contentLength==0)
                    {
                        return "";
                    }

                    byte[] content = new byte[contentLength];

                    if (IsGziped())
                    {
                        MemoryStream decompressed = new MemoryStream();

                        using (var zs = new GZipStream(ns, CompressionMode.Decompress))
                        {
                            while (true)
                            {
                                var buf = new byte[1024];
                                int read = zs.Read(buf, 0, buf.Length);
                                if (read == 0)
                                {
                                    break;
                                }
                                decompressed.Write(buf, 0, read);
                            }
                        }
                        content = decompressed.ToArray();
                    }
                    else
                    {
                        using (BinaryReader rdr = new BinaryReader(ns))
                        {
                            rdr.Read(content, 0, content.Length);
                        }
                    }

                    var encoding = GetEncoding();

                    return encoding.GetString(content.ToArray());
                }

        }

    }

the request looks like this:

GET http://www.youtube.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, */*
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host:www.youtube.com

and the response headers look like this:

HTTP/1.1 200 OK
Date: Sat, 25 Aug 2012 19:46:51 GMT
Server: Apache
X-Content-Type-Options: nosniff
Content-Encoding: gzip
Set-Cookie: use_hitbox=d5c5516c3379125f43aa0d495d100d6ddAEAAAAw; path=/; domain=.youtube.com
Set-Cookie: VISITOR_INFO1_LIVE=av7rkkf4Sfw; path=/; domain=.youtube.com; expires=Mon, 22-Apr-2013 19:46:51 GMT
Expires: Tue, 27 Apr 1971 19:44:06 EST
Cache-Control: no-cache
P3P: CP="This is not a P3P policy! See //support.google.com/accounts/bin/answer.py?answer=151657&hl=en-US for more info."
X-Frame-Options: SAMEORIGIN
Content-Length: 18977
Content-Type: text/html; charset=utf-8

And after this the first int read = zs.Read(buf, 0, buf.Length); sometimes works, but often fails with following exception:

The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. I've tried reading the data as string, and it looks encoded.

Youtube works fine via browser. When reading the data as a string, it looks encoded.

Why am I getting this, and how should I fix that?

UPDATE

It looks like this is some sort of error during transmission. In 5 cases out of 10, it works, in other 5 it fails without an apparent reason

Here's the code if IsGziped()

 bool IsGziped()
    {
        foreach (var h in Headers["Content-Encoding"])
        {
            if (h.ToLowerInvariant().Contains("gzip"))
            {
                return true;
            }
        }
        return false;
    }

Solution

  • StreamReader does not necessarily read just the required number of bytes. It can read more due to internal buffering. This causes compressed bytes to be taken from the NetworkStream ns and put into the StreamReader internal buffer.

    After the bytes have been taken the GZipStream cannot read them.

    You probably need to use a custom header parsing solution that works on a binary level. There is no way to restrict StreamReader to just read the least possible amount of bytes.

    StreamReader is not made to be used together with other readers.