Search code examples
node.jshttprequesttunnel

how can a path and a host be completely different in nodejs


I'm doing research in proxies in nodejs. I came across something that blew my mind. In one of the options for a http.request connection, the source code showed this as the options object

  const options = {
    port: 1337,
    host: '127.0.0.1',
    method: 'CONNECT',
    path: 'www.google.com:80'
  };

This was a part of a far bigger code which was the whole tunneling system. But can someone just explain how the options above work? The whole code is below

const http = require('http');
const net = require('net');
const { URL } = require('url');

// Create an HTTP tunneling proxy
const proxy = http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'text/plain' });
  res.end('okay');
});
proxy.on('connect', (req, clientSocket, head) => {
  // Connect to an origin server
  const { port, hostname } = new URL(`http://${req.url}`);
  const serverSocket = net.connect(port || 80, hostname, () => {
    clientSocket.write('HTTP/1.1 200 Connection Established\r\n' +
                    'Proxy-agent: Node.js-Proxy\r\n' +
                    '\r\n');
    serverSocket.write(head);
    serverSocket.pipe(clientSocket);
    clientSocket.pipe(serverSocket);
  });
});

// Now that proxy is running
proxy.listen(1337, '127.0.0.1', () => {

  // Make a request to a tunneling proxy
  const options = {
    port: 1337,
    host: '127.0.0.1',
    method: 'CONNECT',
    path: 'www.google.com:80'
  };

  const req = http.request(options);
  req.end();

  req.on('connect', (res, socket, head) => {
    console.log('got connected!');

    // Make a request over an HTTP tunnel
    socket.write('GET / HTTP/1.1\r\n' +
                 'Host: www.google.com:80\r\n' +
                 'Connection: close\r\n' +
                 '\r\n');
    socket.on('data', (chunk) => {
      console.log(chunk.toString());
    });
    socket.on('end', () => {
      proxy.close();
    });
  });
});

Source: https://nodejs.org/api/http.html#http_event_connect


Solution

  • You probably have never used a network that requires you to configure a HTTP proxy. Most networks these days configure their firewall to allow HTTP traffic. This means most people these days have never needed to use a HTTP proxy to access the web.

    A long-long time ago when I first started using the internet (around 1994) a lot of networks don't allow transparent internet access. Your PC does not have any connection to the outside world. But sysadmins would install a HTTP proxy that you can connect to. Your PC would only have access to the LAN (which the proxy is a part of) and only the HTTP proxy would have access to the internet.

    Here's an example of how you'd configure Windows to use a HTTP proxy:

    enter image description here

    If you configure your PC as above, then when you connect to www.google.com your browser would connect to the host proxy.example.com on port 8080 and then request it to fetch data from www.google.com.

    As for why it calls the requested resource path it's because it is sent in the "path" part of the packet.

    For example, a normal GET request for getting this page looks something like this:

    GET /questions/60498963 HTTP/1.1
    Host: stackoverflow.com
    

    And the string after GET and before protocol version is normally called the path:

               .---------- this is normally called
               |           the "path"
               v
    GET /questions/60498963 HTTP/1.1
    Host: stackoverflow.com
    

    When making a proxy request the HTTP header looks like this:

    CONNECT stackoverflow.com/questions/60498963 HTTP/1.1
    

    So the url including the domain name is sent to the proxy in the part of the packet usually used to send file path.

    Note that all this has nothing to do with Node.js. This is just basic networking (no programming languages involved).