Search code examples
c#.net-coretcpclient

How to receive the html of a webpage using the dotnet tcpclient


I want to be able to get the html from a page given an address, I would like to know how I should configure the TcpClient to be able to make valid get requests. I cannot use the HTTPClient.

 using var tcpClient = new TcpClient();

        var hostname = "google.com";
        tcpClient.Connect(hostname, 80);

        using NetworkStream networkStream = tcpClient.GetStream();
        networkStream.ReadTimeout = 2000;

        var message = @"GET / HTTP/1.1
                        Accept: text/html, charset=utf-8
                        Connection: close
                        " + "\r\n\r\n";

        Console.WriteLine(message);

        using var reader = new StreamReader(networkStream, Encoding.UTF8);
        byte[] bytes = Encoding.UTF8.GetBytes(message);

        networkStream.Write(bytes, 0, bytes.Length);
        Console.WriteLine(reader.ReadToEnd());

I have tried this, however I receive a 400 Bad Request error.


Solution

  • as mentioned by @rene you need Host header field ... and your header should not have spaces or tabs before it

    var hostname = "google.com";
    tcpClient.Connect(hostname, 80);
    //...
    var message = @$"GET / HTTP/1.1
    Accept: text/html, charset=utf-8
    Connection: close
    Host: {hostname}
    " + "\r\n\r\n";
    

    this will return status code 301 Moved Permanently to www.googel.com

    var hostname = "www.google.com";
    tcpClient.Connect(hostname, 80);
    //...
    var message = @$"GET / HTTP/1.1
    Accept: text/html, charset=utf-8
    Connection: close
    Host: {hostname}
    " + "\r\n\r\n";
    

    this will make the request success with status code 200 OK