We just started on Huffman tree's in class and I'm having a few issues. First given the data and frequencies...
Data % / - + *
Frequencies 5 10 25 30 50
Create a custom Huffman tree.
I created...
120
/ \
50 70
/ \
30 40
/ \
15 25
/ \
5 10
And then replaced the frequencies with the corresponding data, but my roommate received a different answer? Am I the one in the wrong here?
Also, I can't seem to wrap my head around this question,
What would the Huffman code look like if all symbols in the alphabet had
equal frequency?
Any and all help is much appreciated!
P.S. These are just study guide questions, not homework related.
EDIT: How I arrived at my answer:
Took 5
and 10
at the bottom of the tree, added those together to get a "ghost" node 15
. Added 25
to the right of that because it is bigger, then created a ghost node 40
by adding those together. put 30
to the left of 40
because it is smaller, and then created a ghost node 70
by adding the two. Finally added 50
to the left of 70
because it was smaller and then created the final ghost node 120
by adding the two.
This is what my Huffman.pm
comes up with:
0 -- *
10 -- +
110 -- -
1110 -- /
1111 -- %
The above are the shortest symbols encoding the input string. It is a tree, with an implicit root:
Root
/ \
0(*) 1
/ \
0(+) 1
/ \
0(-) 1
/ \
0(/) 1(%)
There are different ways to encode it, but it must be consistent. As per wikipedia:
the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol
Your tree
120
/ \
50(*) 70
/ \
30(+) 40
/ \
15 25(-)
/ \
5(%) 10(/)
satisfies this, although with a different encoding:
0 *
10 +
1100 %
1101 /
111 -
The reason our trees are different is because mine is a Canonical Huffman Code, which means that I only have to list the symbols and length of their encoding (or path) for anyone to be able to re-construct the Huffman codes/tree:
*: 1
+: 2
-: 3
/: 4
%: 4
This is because a 0
always terminates a code, and 1
always means that at least 1 more node will follow, except for the outermost leaf (%
, 1111
), which is possible because we know the maximum code length (the maximum depth of the tree).