Search code examples
node.jshttp-proxy

Is it possible to distinguish a requestURL as one typed in the address bar to log in a node proxy?


I just could not get the http-proxy module to work properly as a forward proxy. It works great as a reverse proxy. Therefore, I have implemented a node-based forward proxy using node's http and net modules. It works fine, both with http and https. I will deal with websockets later. Among other things, I want to log the URLs visited or requested through a browser. In the request object, I do get the URL, but as expected, when a page loads, a zillion other requests are triggered, including AJAX, third-party ads, etc. I do not want to log these.

I know that I can distinguish an AJAX request from the x-requested-with header. I can distinguish requests coming from a browser by examining the user-agent header (though these can be spoofed thru cURL). I want to minimize the log entries.

How do commercial proxies log such info? Or do they just log every request? One way would be to not log any requests within a certain time after the main request presuming that they are all associated with the main request. That would not be technically accurate.

I have researched in this area but did not find any solution. I am not looking for any specific code, just some direction...


Solution

  • No one can know that with precision, but you can find clues such as, "HTTP referer", "x-requested-with" or add your custom headers in each ajax request (squid proxy by default sends a "X-Forwarded-For" which says he is a proxy), but anybody can figure out what headers are you sending for your requests or copy all headers that a common browser sends by default and you will believe it is a person from a browser, but could be a bash cURL sent by a bot. So, really, you can't know for example, if a request is an AJAX request because the headers aren't mandatory, by default your browser or your framework adds an x-requested-with or useful information to help to "guess" who is performing the request.