Search code examples
tcpstreamwebsocketframepacket

What is the mask in a WebSocket frame?


I am working on a websocket implementation and do not know what the sense of a mask is in a frame.

Could somebody explain me what it does and why it is recommend?

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-------+-+-------------+-------------------------------+
 |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
 |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
 |N|V|V|V|       |S|             |   (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+

Solution

  • Websockets are defined in RFC6455, which states in Section 5.3:

    The unpredictability of the masking key is essential to prevent authors of malicious applications from selecting the bytes that appear on the wire.

    In a blog entry about Websockets I found the following explanation:

    masking-key (32 bits): if the mask bit is set (and trust me, it is if you write for the server side) you can read for unsigned bytes here which are used to xor the payload with. It's used to ensure that shitty proxies cannot be abused by attackers from the client side.

    But the most clearly answer I found in an mailing list archive. There John Tamplin states:

    Basically, WebSockets is unique in that you need to protect the network infrastructure, even if you have hostile code running in the client, full hostile control of the server, and the only piece you can trust is the client browser. By having the browser generate a random mask for each frame, the hostile client code cannot choose the byte patterns that appear on the wire and use that to attack vulnerable network infrastructure.

    As kmkaplan stated, the attack vector is described in Section 10.3 of the RFC.
    This is a measure to prevent proxy cache poisoning attacks1. What it does, is creating some randomness. You have to XOR the payload with the random masking-key.

    By the way: It isn't just recommended. It is obligatory.

    1: See Huang, Lin-Shung, et al. "Talking to yourself for fun and profit." Proceedings of W2SP (2011)