Search code examples
treetheoryhuffman-codealphabet

Huffman Tree Issue


We just started on Huffman tree's in class and I'm having a few issues. First given the data and frequencies...

 Data         %    /    -    +     *
 Frequencies  5    10   25   30    50

Create a custom Huffman tree.

I created...

                            120
                           /   \
                          50   70
                               /  \
                              30   40
                                   / \
                                  15  25
                                 / \
                                 5  10

And then replaced the frequencies with the corresponding data, but my roommate received a different answer? Am I the one in the wrong here?

Also, I can't seem to wrap my head around this question,

What would the Huffman code look like if all symbols in the alphabet had
equal frequency?

Any and all help is much appreciated!

P.S. These are just study guide questions, not homework related.

EDIT: How I arrived at my answer:

Took 5 and 10 at the bottom of the tree, added those together to get a "ghost" node 15. Added 25 to the right of that because it is bigger, then created a ghost node 40 by adding those together. put 30 to the left of 40 because it is smaller, and then created a ghost node 70 by adding the two. Finally added 50 to the left of 70 because it was smaller and then created the final ghost node 120 by adding the two.


Solution

  • This is what my Huffman.pm comes up with:

    0  -- *
    10  -- +
    110  -- -
    1110  -- /
    1111  -- %
    

    The above are the shortest symbols encoding the input string. It is a tree, with an implicit root:

       Root
      /    \
    0(*)    1
           /  \
         0(+)  1
              /  \
            0(-)  1
                 /  \
               0(/) 1(%)
    

    There are different ways to encode it, but it must be consistent. As per wikipedia:

    the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol

    Your tree

                            120
                           /   \
                        50(*)    70            
                               /  \
                            30(+)  40          
                                   / \
                                  15  25(-)
                                 /  \
                              5(%)  10(/)
    

    satisfies this, although with a different encoding:

    0     *
    10    +
    1100  %
    1101  /
    111   -
    

    The reason our trees are different is because mine is a Canonical Huffman Code, which means that I only have to list the symbols and length of their encoding (or path) for anyone to be able to re-construct the Huffman codes/tree:

    *: 1
    +: 2
    -: 3
    /: 4
    %: 4
    

    This is because a 0 always terminates a code, and 1 always means that at least 1 more node will follow, except for the outermost leaf (%, 1111), which is possible because we know the maximum code length (the maximum depth of the tree).