Search code examples
memorycompressionhuffman-codememory-efficient

How to write a hashmap to a file in a memory efficient format?


I am writing a Huffman Coding/Decoding algorithm and I am running into the problem that the storing the Huffman tree is taking up way to much room. Currently, I am converting the tree into a hashMap as such -> hashMap<Character(s),Huffman Code> and then storing that hash map. The issue is that, while the string is compressed great, adding the Huffman Tree data stored in the hash map is adding so much overhead that it's actually ending up bigger than the original. Currently I am just naively writing [data, value] pairs to the file, but I imagine there must be some sort of trickier way to do that. Any ideas?


Solution

  • You do not need the tree in order to encode. All you need is the bit lengths for each symbol and a way to order the symbols. See Canonical Huffman Code.

    In fact, all you need is the symbols that are coded ordered by bit length, and within bit length sorted by symbol, and then the number of codes of each length. With just those two things you can encode.