Search code examples
pythontreehuffman-code

How do get the correct encoded data with huffman tree?


I am trying to encode letters using a Huffman tree.

I already have my tree, I am now given a string of characters and I am trying to encode it using the tree.

I don't want/cannot use dict().

I am running into a problem. b is supposed to be encoded as 01 and trying it with one b I am getting 0101010101 (as if there were 5 b)

my python code:

def go_through_tree(huffmanTree, encod, el, result):
    """
    go through the tree returning the encoded element.
    """
    if huffmanTree.left == None and huffmanTree.right == None:
        letter = huffmanTree.key
        if letter == el:
            return encod
    else:
        child0 = huffmanTree.right
        child1 = huffmanTree.left
        result += go_through_tree(child0, encod + "0", el, result)
        result += go_through_tree(child1, encod + "1", el, result)
    return result

def encodedata(huffmanTree, dataIN):
    """
    Encodes the input string to its binary string representation.
    """
    result = ""
    for el in dataIN:
        result += go_through_tree(huffmanTree, "", el, "")
    return result

Solution

  • There is indeed a problem with the logic. When there is a match, the function returns the encoding. In that case result becomes that encoding. So far, so good. But then we get to another leaf in the tree, and if letter == el is False this time, and so the function just returns the result it already had. But now comes the problem: this result is appended to the result we already have (using the += operator). This is why you get repetitions.

    You should just exit the recursion as soon as you have a match. There is no need to continue the search further.

    So, you don't need the result parameter, and when you get a result back, you should exit immediately propagating that result to the caller. If there is no match, return the empty string.

    def go_through_tree(huffmanTree, encod, el):
        if huffmanTree.left is None and huffmanTree.right is None:
            return encod if huffmanTree.key == el else ""  # Empty when no match
        else:
            # Return the first search that is successful (using OR):
            return (go_through_tree(huffmanTree.right, encod + "0", el)
                 or go_through_tree(huffmanTree.left,  encod + "1", el))
    
    
    def encodedata(huffmanTree, dataIN):
        return "".join(go_through_tree(huffmanTree, "", el) for el in dataIN)