Search code examples
http2huffman-code

Why Is Huffman Encoding Optional in HTTP/2 HPACK?


I want to make sure I understand this correctly: This is from section 5.2 of RFC7451

   Header field names and header field values can be represented as
   string literals.  A string literal is encoded as a sequence of
   octets, either by directly encoding the string literal's octets or by
   using a Huffman code (see [HUFFMAN]).

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | H |    String Length (7+)     |
   +---+---------------------------+
   |  String Data (Length octets)  |
   +-------------------------------+

This means I can either send Header string literals with H being 1 with Huffman Encoded string; or with H being 0 and the original string octets; and the existing HTTP/2 server/implementation should parse them correctly, right?


Solution

  • HTTP Headers are basically made up of ASCII codes. ASCII uses fixed length codes where each character is 8 bits in length (well actually only 7 bits since HTTP Headers only uses the first 127 codes in the original ASCII character set but the 8th bit is set to 0).

    Huffman encoding uses variable length encoding. More frequently used characters have shorter codes less than 8 bits, and less frequently used characters have more than 8 bits. The theory being most text is made up of the more frequently used codes so the length should be shorter than ASCII in most cases. This is especially true since ASCII “wastes” a bit when only using that basic character needing only 7 bits, but save it in 8 bits of space.

    So there will be some pieces of text which are actually longer than ASCII if using Huffman encoding.

    The Huffman coding table used in HPACK is shown here and as an example you can see < is encoded as 111111111111100 which is 15 bits. Therefore to Huffman encode the string <<<< would take 4 octets in ASCII but 60 bits or 8 octets in Huffman encoding.

    Therefore HPACK allows you to use ASCII in this case as that is more efficient.

    Maybe this is a little over complicated and we should just accept the slightly less efficient encoding in these rare edge cases - some say the IETF is obsessed with saving bits - but that’s why it’s there.

    Note that receivers can’t control what the other side uses, so every HTTP/2 implementation needs to understand Huffman encoding. So it’s not optional in the sense that you can make an HTTP/2 implementation without it, but the use of it for individual header names or values is optional.

    Btw if interested in understanding HPACK in more detail than the spec gives, then I cover it (including the answer to this question!) in chapter 8 of my book: https://www.manning.com/books/http2-in-action.