I'm doing research in proxies in nodejs. I came across something that blew my mind. In one of the options for a http.request
connection, the source code showed this as the options object
const options = {
port: 1337,
host: '127.0.0.1',
method: 'CONNECT',
path: 'www.google.com:80'
};
This was a part of a far bigger code which was the whole tunneling system. But can someone just explain how the options above work? The whole code is below
const http = require('http');
const net = require('net');
const { URL } = require('url');
// Create an HTTP tunneling proxy
const proxy = http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('okay');
});
proxy.on('connect', (req, clientSocket, head) => {
// Connect to an origin server
const { port, hostname } = new URL(`http://${req.url}`);
const serverSocket = net.connect(port || 80, hostname, () => {
clientSocket.write('HTTP/1.1 200 Connection Established\r\n' +
'Proxy-agent: Node.js-Proxy\r\n' +
'\r\n');
serverSocket.write(head);
serverSocket.pipe(clientSocket);
clientSocket.pipe(serverSocket);
});
});
// Now that proxy is running
proxy.listen(1337, '127.0.0.1', () => {
// Make a request to a tunneling proxy
const options = {
port: 1337,
host: '127.0.0.1',
method: 'CONNECT',
path: 'www.google.com:80'
};
const req = http.request(options);
req.end();
req.on('connect', (res, socket, head) => {
console.log('got connected!');
// Make a request over an HTTP tunnel
socket.write('GET / HTTP/1.1\r\n' +
'Host: www.google.com:80\r\n' +
'Connection: close\r\n' +
'\r\n');
socket.on('data', (chunk) => {
console.log(chunk.toString());
});
socket.on('end', () => {
proxy.close();
});
});
});
You probably have never used a network that requires you to configure a HTTP proxy. Most networks these days configure their firewall to allow HTTP traffic. This means most people these days have never needed to use a HTTP proxy to access the web.
A long-long time ago when I first started using the internet (around 1994) a lot of networks don't allow transparent internet access. Your PC does not have any connection to the outside world. But sysadmins would install a HTTP proxy that you can connect to. Your PC would only have access to the LAN (which the proxy is a part of) and only the HTTP proxy would have access to the internet.
Here's an example of how you'd configure Windows to use a HTTP proxy:
If you configure your PC as above, then when you connect to www.google.com
your browser would connect to the host proxy.example.com
on port 8080
and then request it to fetch data from www.google.com
.
As for why it calls the requested resource path
it's because it is sent in the "path" part of the packet.
For example, a normal GET request for getting this page looks something like this:
GET /questions/60498963 HTTP/1.1
Host: stackoverflow.com
And the string after GET and before protocol version is normally called the path:
.---------- this is normally called
| the "path"
v
GET /questions/60498963 HTTP/1.1
Host: stackoverflow.com
When making a proxy request the HTTP header looks like this:
CONNECT stackoverflow.com/questions/60498963 HTTP/1.1
So the url including the domain name is sent to the proxy in the part of the packet usually used to send file path.
Note that all this has nothing to do with Node.js. This is just basic networking (no programming languages involved).