Search code examples
securitywebwebsocketmasking

Why are WebSockets masked?


I was following a guide provided by MDN on Writing a WebSocket server, the guide is pretty straightforward and easy to understand...

However upon following this tutorial I ran across the frame that WebSocket messages from the client are sent in:


0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

After making some functions to properly unmask the data and the frame that are sent by client, it made me wonder why the data is even masked to begin with. I mean, you don't have to mask data you're sending from the server...

If someone were getting the data for bad reasons, it could be relatively easy to unmask it because the masking key is included with the whole message. Or even provided they didn't have the key, the masking-key in the frame is only 2 bytes long. Someone could easily unmask the data since the key is very very small.

Another reason I'm wondering why the data is masked is because you can simply protect your WebSocket data better than the masking by using WSS (WebSockets Secure) on TLS/SSL, and over HTTPS.

Am I missing the point of why WebSockets are masked? Seems like it just adds pointless struggle to unmask the data sent by the client when it doesn't add any security to begin with.


Solution

  • jfriend00's comment has great links to good information...

    I do want to point out to the somewhat obvious, so as to show that masking unencrypted websocket connections is a necessary requirement, rather than just beneficial:

    Proxies, routers and other intermediaries (esp. ISPs) often read the requests sent by the a client and "correct" any issues, add headers and otherwise "optimize" (such as respond from cache) network resource consumption.

    Some headers and request types (such as Connect) are often directed at these intermediaries rather than the endpoint server.

    Since many of these devices are older and unaware of the Websockets protocol, clear text that looks like an HTTP request might be edited or acted upon.

    Hence, it was necessary that clear text would be "shifted" to unrecognized bytes, to initiate a "pass through" rather than "processing".

    After this point, it was just about leveraging the masking to make sure hackers didn't "reverse" this masking to send malicious frames.

    As for requiring wss instead of masking - I know this was considered during the writing of the standard... but until certificates are free, this would make any web standard requiring SSL/TLS a "rich man's" standard rather than an internet wide solution.

    As for "why mask wss data?" - I'm not sure about this one, but I suspect that it is meant to allow the parser to be connection agnostic and easier to write. In clear text, unmasked frames are a protocol error and result in a disconnection initiated by the server. Having the parser behave the same, regardless of the connection, allows us to separate the parser from the raw IO layer, making it connection agnostic and offering support for event based programming.