Search code examples
javascriptnode.jsexpresshttp-proxynode-http-proxy

How does node-http-proxy parse the target url?


I ran into a problem and I feel that node-http-proxy is changing my target links. I got a few examples below.

I am using express as my server and using Metaweather API .

The problem is that I was able to get data from the endpoints below https://www.metaweather.com/api/location/2487956/ https://www.metaweather.com/api/location/2487956/2013/4/30/

But when I try to call the API from https://www.metaweather.com/api/location/search/?lattlong=36.96,-122.02

It fails with status code 500 which I lead me thinking that node-http-proxy added some values after 122.02 as it was not closed with /

server.js

const express = require("express");
const next = require("next");
const dev = process.env.NODE_ENV !== "production";
const app = next({ dev });
const handle = app.getRequestHandler();

const httpProxy = require("http-proxy");

const proxyOptions = {
  changeOrigin: true
};

const apiProxy = httpProxy.createProxyServer(proxyOptions);

const apiUrl =
  "https://www.metaweather.com/api/location/search/?lattlong=36.96,-122.02";

/*
https://www.metaweather.com/api/location/search/?lattlong=36.96,-122.02 - failed with 500
https://www.metaweather.com/api/location/2487956/ - passed
https://www.metaweather.com/api/location/2487956/2013/4/30/ - passed
*/

app
  .prepare()
  .then(() => {
    const server = express();

    server.use("/api", (req, res) => {
      console.log("Going to call this API " + apiUrl);
      apiProxy.web(req, res, { target: apiUrl });
    });

    server.get("*", (req, res) => {
      return handle(req, res);
    });

    server.listen(3000, err => {
      if (err) throw err;
      console.log("> Ready on http://localhost:3000");
    });
  })
  .catch(ex => {
    console.error(ex.stack);
    process.exit(1);
  });

Thanks for looking into this question.


Solution

  • I have reproduced where this is happening in node-http-proxy.

    In common.js there is a function called urlJoin which is appending the req.url to the end of the target url.

    I'm not exactly sure what the intent is, but it's a start.

    Here's my test:

    const urlJoin = function() {
      //
      // We do not want to mess with the query string. All we want to touch is the path.
      //
    var args = Array.prototype.slice.call(arguments),
        lastIndex = args.length - 1,
        last = args[lastIndex],
        lastSegs = last.split('?'),
        retSegs;
    
    args[lastIndex] = lastSegs.shift();
    
    //
    // Join all strings, but remove empty strings so we don't get extra slashes from
    // joining e.g. ['', 'am']
    //
    retSegs = [
      args.filter(Boolean).join('/')
          .replace(/\/+/g, '/')
          .replace('http:/', 'http://')
          .replace('https:/', 'https://')
    ];
    
    // Only join the query string if it exists so we don't have trailing a '?'
    // on every request
    
    // Handle case where there could be multiple ? in the URL.
    retSegs.push.apply(retSegs, lastSegs);
    
    return retSegs.join('?')
    };
    
    let path = urlJoin('/api/location/search/?lattlong=36.96,-122.02', '/');
    
    console.log(path);
    //                 /api/location/search/?lattlong=36.96,-122.02/