Search code examples
c++calgorithmhuffman-code

Huffman encode single character without lookup table


I am trying to implement Huffman Coding and can't figure out how to encode a character using the trie without generating a lookup table. What I don't want to do is generate a map of each character to its encoded string of bits. I am trying to write a method which takes in the root of the Huffman trie and a character and spits out the correct code.

I have written the following code which doesn't work correctly. The problem is, I don't know how to get it to stop after getting the right result. It continues to add to the code after reaching the correct leaf node.

string lookup(HuffNode* root, char c, string prevCode, string direction){
  string currentCode = prevCode + direction;
  if(!root->isLeaf()){
    currentCode = lookup(root->getLeft(), c, currentCode, "0");
    currentCode = lookup(root->getRight(), c, currentCode, "1");
  }else{
    if(root->getChar() == c){
      return currentCode;
    }else{
      return prevCode;
    }
  }

  return currentCode;
}

string encodeChar(HuffNode* trie, char c){
  return lookup(trie, c, "", "");
} 

What would be the correct way to get the encoding of a character from a trie?


Solution

  • It would be terribly inefficient to have to search a good chunk of the tree every single time you want to emit a code. You should in fact make a table of codes for each symbol.

    In any case, what you need to do is build up the code in a single string that is passed by reference. Do not return the string. (It will be there when it's all done in the string that was originally passed by reference.) Instead return true or false for if the symbol was found in a leaf. Then when you call lookup() on each branch, see if it returns true. If it does, then immediately return true. For each call, add the appropriate bit to the string (0 or 1). If it returns false for the second branch, remove the bit you added, and return false.

    Then the original call will return true if it finds the symbol, and the string will have the code. If it returns false, the symbol wasn't in any of the leaves, and the string will be empty.