Search code examples
encodinghuffman-code

How does huffman encoding know the length of each value code it's reading?


I'm trying to understand how Huffman encoding works. All of the summaries I've read explain how to generate the value codes, but not the full process of how to actually read them in. I'm wondering how the algorithm knows the bit-length of each value it's reading in.

For example, if you are representing the symbol string "ETQ A" with the code series "01 -- 110 -- 1101 -- 1 -- 10", how do you know where one symbol begins and another one ends? How do you know to read two bits at index 1, three bits at index 2, etc?


Solution

  • When decoding, there are two givens:

    1. Huffman encoding tree
    2. Input bytes

    You then just consume the bits until reaching a leaf node, at which point you terminate for that single character decoding.

    decoding
    (source: kirk at people.cs.pitt.edu)

    You then continue consuming bits until all characters are decoded

    Taken from http://people.cs.pitt.edu/~kirk/cs1501/animations/Huffman.html

    The decoding procedure is deceptively simple. 
    Starting with the first bit in the stream, one then uses successive bits from 
    the stream to determine whether 
    to go left or right in the decoding tree.
    When we reach a leaf of the tree, 
    we've decoded a character, so we place
    that character onto the (uncompressed)     
    output stream. The next bit in the 
    input stream is the first bit of the 
    next character.