Playing with TcpClient and NetworkStream on .NET Core 2.2.
Trying to get the content from https://www.google.com/
Before I continue, I'd like to make clear that I do NOT want to use WebClient, HttpWebRequest or HttpClient classes. There are a lot of questions where people had encountered some problems using TcpClient and where responders or commenters have suggested the use of something else for this task, so please don't.
Let's say we have an instance of SslStream obtained from TcpClient's NetworkStream and properly authenticated.
Let's say that also have one StreamWriter
that we use to write HTTP messages to this stream and one StreamReader
that we use to read HTTP message headers from the response:
var tcpClient = new TcpClient("google.com", 443);
var stream = tcpClient.GetStream();
var sslStream = new SslStream(stream, false);
sslStream.AuthenticateAsClient("google.com");
var streamWriter = new StreamWriter(sslStream);
var streamReader = new StreamReader(sslStream);
Say we send a request in the same way as a Firefox browser would have sent one:
GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
Which causes the following response to be sent:
HTTP/1.1 200 OK
Date: Sun, 28 Apr 2019 17:28:27 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=31536000
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Content-Encoding: br
Server: gws
Content-Length: 55786
... etc
Now, after reading all response headers using streamReader.ReadLine()
and parsing the content length found in the response header, let's read the response content into a buffer:
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
} while (totalBytesRead < contentLength && bytesRead > 0);
However, this do..while
loop will only exit after the connection has been closed by the remote server, which means the last call to Read
will hang. Which means we've already read the entire response content, and the server is already listening for another HTTP message on this stream. Is the contentLength
incorrect? Does the streamReader
read too much when calling ReadLine
and therefore does it mess up the SslStream
position, which causes invalid data to be read?
What gives? Has anyone had experience with this?
P.S. Here is a sample console app code with all safety checks omitted which demonstrates this:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamReader = new StreamReader(sslStream))
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = new List<string>();
var line = streamReader.ReadLine();
var contentLength = 0;
while (!string.IsNullOrWhiteSpace(line))
{
var split = line.Split(": ");
if (split.First() == "Content-Length")
{
contentLength = int.Parse(split[1]);
}
lines.Add(line);
line = streamReader.ReadLine();
}
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
EDIT
This always happens after I submit a question. I've been scratching my head for a couple of days without being able to find the cause of the problem, but as soon as I submitted it, I knew it was something to do with StreamReader
messing things up when trying to read a line.
So if I stop using the StreamReader
and replace calls to ReadLine
with something that reads byte-by-byte, everything seems to be fine. The replacement code can be written as the following:
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char) buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char) buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
...which would then make our sample console app look like this:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = ReadHeader(sslStream);
var contentLengthLine = lines.First(x => x.StartsWith("Content-Length"));
var split = contentLengthLine.Split(": ");
var contentLength = int.Parse(split[1]);
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
The answer to the question in the title is YES.
It can be trusted, as long as you read the message header properly, i.e. do not use StreamReader.ReadLine
.
Here is a utility method which does the job:
private static string ReadStreamUntil(Stream stream, string boundary)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a string builder with some placeholder chars of the length as the boundary
var boundaryPlaceholder = string.Join(string.Empty, boundary.Select(x => "."));
var check = new StringBuilder(boundaryPlaceholder);
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
var byteCount = stream.Read(buffer, 0, 1);
if (byteCount == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// boundary marks the end of the message, so break here
} while (check.ToString() != boundary);
return responseBuilder.ToString();
}
Then, to read the header, we can just call ReadStreamUntil(sslStream, "\r\n\r\n")
.
The key here is to read the stream byte by byte until a known byte sequence (in this case \r\n\r\n) is encountered.
After it's been read by using this method, the stream will be at the correct position for the response content to be read properly.
If any good, this method can easily be converted to async variant by calling await ReadAsync
instead of Read
.
It's worth noting that the above method only works fine if the text is ASCII encoded.