I'm trying to understand how Huffman encoding works. All of the summaries I've read explain how to generate the value codes, but not the full process of how to actually read them in. I'm wondering how the algorithm knows the bit-length of each value it's reading in.
For example, if you are representing the symbol string "ETQ A" with the code series "01 -- 110 -- 1101 -- 1 -- 10", how do you know where one symbol begins and another one ends? How do you know to read two bits at index 1, three bits at index 2, etc?
When decoding, there are two givens:
You then just consume the bits until reaching a leaf node, at which point you terminate for that single character decoding.
(source: kirk at people.cs.pitt.edu)
You then continue consuming bits until all characters are decoded
Taken from http://people.cs.pitt.edu/~kirk/cs1501/animations/Huffman.html
The decoding procedure is deceptively simple.
Starting with the first bit in the stream, one then uses successive bits from
the stream to determine whether
to go left or right in the decoding tree.
When we reach a leaf of the tree,
we've decoded a character, so we place
that character onto the (uncompressed)
output stream. The next bit in the
input stream is the first bit of the
next character.