html error-handling http-status-code-404 rfc http-status-code-400

What HTTP Response code to return for urls containing " or %22?

Per https://www.rfc-editor.org/rfc/rfc7230#section-3.1.1 recipients of an invalid request-line SHOULD respond with a 400 - Bad Request. Thus as per the RFC, the request, GET /cat".html HTTP/1.1 should return 400.

I've written a server that will return just that upon detection of a ". Thus a request via telnet to my server returns just that.

However, when the identical request is sent via a browser, GET /cat".html HTTP/1.1 is converted by the browser and sent as GET /cat%22.html HTTP/1.1. Thus, 400 is not being returned but rather 404 - Not Found since the file cat%22.html is not in my public directory.

I'm confused as to what the RFC is wanting since it would never be possible to send GET /cat".html HTTP/1.1 via a browser and have a error code of 404 returned. Since cat".html is a bad request sent via a browser a server should return that but it's not possible unless you code in the server %22 as being a bad request however anything with %22 in the filename is valid and thus wouldn't be a 400 bad request although it could be 404 Not Found.

What am I missing here?

Solution

The HTTP specificication says that the HTTP request, nothing to do with browsers the specification is HTTP(the protocol only), shouldn't contain a ". If you try and send a " your browser is url encoding it to %22 because " is invalid (it's helping you). So that's a good thing right?

it would never be possible to send GET /cat".html HTTP/1.1

Your presuming that all HTTP is generated by browsers, it's not. Many many technologies and software generate HTTP. Not all of them will kindly URL encode your request for you.

BTW: You shouldn't really assume that all browsers will do this either, to assume makes an ass-out-of-u-and-me ;)

TL;DR

If your HTTP contains an actual " return a 400

If your HTTP request has url encoded the " to a %22 this is valid and should be processed accordingly (this may result in a 404)