I am trying to encode letters using a Huffman tree.
I already have my tree, I am now given a string of characters and I am trying to encode it using the tree.
I don't want/cannot use dict().
I am running into a problem. b is supposed to be encoded as 01 and trying it with one b I am getting 0101010101 (as if there were 5 b)
my python code:
def go_through_tree(huffmanTree, encod, el, result):
"""
go through the tree returning the encoded element.
"""
if huffmanTree.left == None and huffmanTree.right == None:
letter = huffmanTree.key
if letter == el:
return encod
else:
child0 = huffmanTree.right
child1 = huffmanTree.left
result += go_through_tree(child0, encod + "0", el, result)
result += go_through_tree(child1, encod + "1", el, result)
return result
def encodedata(huffmanTree, dataIN):
"""
Encodes the input string to its binary string representation.
"""
result = ""
for el in dataIN:
result += go_through_tree(huffmanTree, "", el, "")
return result
There is indeed a problem with the logic. When there is a match, the function returns the encoding. In that case result
becomes that encoding. So far, so good. But then we get to another leaf in the tree, and if letter == el
is False this time, and so the function just returns the result
it already had. But now comes the problem: this result is appended to the result we already have (using the +=
operator). This is why you get repetitions.
You should just exit the recursion as soon as you have a match. There is no need to continue the search further.
So, you don't need the result
parameter, and when you get a result back, you should exit immediately propagating that result to the caller. If there is no match, return the empty string.
def go_through_tree(huffmanTree, encod, el):
if huffmanTree.left is None and huffmanTree.right is None:
return encod if huffmanTree.key == el else "" # Empty when no match
else:
# Return the first search that is successful (using OR):
return (go_through_tree(huffmanTree.right, encod + "0", el)
or go_through_tree(huffmanTree.left, encod + "1", el))
def encodedata(huffmanTree, dataIN):
return "".join(go_through_tree(huffmanTree, "", el) for el in dataIN)