http network-programming proxy http-proxy

When implementing a web proxy, how should the server report lower-level protocol errors?

I'm implementing an HTTP proxy. Sometimes when a browser makes a request via my proxy, I get an error such as ECONNRESET, Address not found, and the like. These indicate errors below the HTTP level. I'm not talking about bugs in my program -- but how other servers behave when I send them an HTTP request.

Some servers might simply not exist, others close the socket, and still others not answer at all.

What is the best way to report these errors to the caller? Is there a standard method that, if I use it, browsers will convert my HTTP message to an appropriate error message? (i.e. they get a reply from the proxy that tells them ECONNRESET, and they act as though they received the ECONNRESET themselves).

If not, how should it be handled?

Motivations

I really want my proxy to be totally transparent and for the browser or other client to work exactly as if it wasn't connected to it, so I want to replicate the organic behavior of errors such as ECONNRESET instead of sending an HTTP message with an error code, which would be totally different behavior.

I kind of thought that was the intention when writing an HTTP proxy.

Solution

There are several things to keep in mind.

Firstly, if the client is configured to use the proxy (which actually I'd recommend) then fundamentally it will behave differently than if it were directly connecting out over the Internet. This is mostly invisible to the user, but affects things like:

FTP URLs
some caching differences
authentication to the proxy if required
reporting of connection errors etc <= your question.

In the case of reporting errors, a browser will show a connectivity error if it can't connect to the proxy, or open a tunnel via the proxy, but for upstream errors, the proxy will be providing a page (depending on the error, e.g. if a response has already been sent the proxy can't do much but close the connection). This page won't look anything like your browser page would.

If the browser is NOT configured to use a proxy, then you would need to divert or intercept the connection to the proxy. This can cause problems if you decide you want to authenticate your users against the proxy (to identify them / implement user-specific rules etc).

Secondly HTTPS can be a real pain in the neck. This problem is growing as more and more sites move to HTTPS only. There are several issues:

browsers configured to use a proxy, for HTTPS URLS will firstly open a tunnel via the proxy using the CONNECT method. If your proxy wants to prevent this then any information it provides in the block response is ignored by the browser, and instead you get the generic browser connectivity error page.
if you want to provide any other benefits one normally wishes from a proxy (e.g. caching / scanning etc) you need to implement a MitM (Man-in-the-middle) and spoof server SSL certificates etc. In fact you need to do this if you just want to send back a block-page to deny things.

There is a way a browser can act a bit more like it was directly connected via a proxy, and that's using SOCKS. SOCKS has a way to return an error code if there's an upstream connection error. It's not the actual socket error code however.

These are all reasons why we wrote the WinGate Internet Client, which is a LSP-based product for our product WinGate. Client applications then learn the actual upstream error codes etc.

It's not a favoured approach nowadays though, as it requires installation of software on the client computer.