Search code examples
chttpcross-platformprotocolsimplementation

Cross-language HTTP parser?


I have met with a rather strange observation.

HTTP is a protocol used the most in this world. Don't know the stats, but surely it takes up at least 90% of the whole traffic on the internet.

Every major language has a library for creating a web-server and consuming traffic with a web-client. And practically every major language is a decendant of C and/or has C bindings.

But it seems like even though the both ends meet to naturally form the basis of the internet, there is no unified, widely used implementation of the HTTP parser in C.

There is http_parser by Ryan Dahl, there is its decendant llhttp by Fedor Indutny (both used in Node.js), and, I suppose, every framework implements its own parser for its needs.

Why, though?

I'm mostly worried about compatibility issues, about library support. For example, HTTP/3 rolls out and the library I used in my project doesn't support it, or Node.js is deprecated and no support again. Of course, the arguments sound silly, but look at it this way: why does everyone implement their own parser, if the HTTP protocol specification is clear AND at the end of the day practically every language out there has C bindings? Wouldn't it be better for everyone to agree on one code base and use it in every language? There is just nothing to compete about. It's a protocol, it is to be the same for everyone.

The question is that: Is there a cross-language solution, used by most of the servers, http clients? If there isn't, why?


Solution

  • Several reasons:

    • C is a product of the early 1970s, when systems tended to be monolithic and network-centric architectures were somewhat rare. It was created primarily to implement the Unix operating system. And it has precious little language-level support for much of anything - no native networking, graphics, sound, or much else. That’s why it’s as portable as it is - the language definition makes relatively few demands of the underlying platform. The group that maintains the C standard tends to be conservative about adding features.

    • HTTP is one protocol of many - telnet, SMTP, NNTP, FTP, SSH, etc., all of which are or have been as widely used as HTTP at some point. 30 years ago a good case could have been made for making telnet or FTP support native (which would have required a native TCP/IP stack as well). Now it’s HTTP and HTTPS, which would require a native SSL implementation.

    Paradigms (and protocols) come and go, but legacy code is forever. Making protocols part of the language makes the language bigger and harder to maintain. New protocols get created, old protocols fall out of favor or are deprecated, leading to more maintenance issues. Each time a protocol is updated you’d need a compiler update (or at least a standard library update).

    Life is just easier if all of that is kept separate from the language itself.

    As for why there are so many different implementations...

    • Different platforms have different APIs - at some point you have to have a system-specific implementation;

    • Different people have different requirements for usability, capability, scalability, and security. A lightweight implementation that may work just fine for individual use may fall down under load;

    • Somebody may just not be aware of an existing implementation and rolls their own;

    • And, finally, there’s no referee; standards exist, and groups that maintain and enforce those standards exist, but there’s no one who officially blesses a particular implementation.