Search code examples
javaalgorithmdata-structureshuffman-codesplay-tree

How can I use splay tree data structure in huffman coding for data compression?


First of all, I am new to programming, so I would expect simple and well explained answers. Secondly this is a very specific question and I don't want moderators and other users to just close this question as off-topic or for being too broad.

Anyway I want to implement Huffman coding in java using some kind of a data structure. But ,however, I was thinking of using splay tree as it's something that will not be covered in my course's syllabus and also since I want to learn a new data structure. Now the main question is if the Huffman coding algorithm would require splay tree data structure in the first place?

What can I use splay tree for in my Huffman based data compression project? Or would you rather suggest a better(for it's efficiency and maybe creativity in the context that it's unique and not so heard of) data structure for this project?

Thanks


Solution

  • Any Huffman code can be represented by the structure of a binary tree, whose leaves are the symbols to be encoded. When following a path from the root to the symbol to be encoded, left and right branches can be represented as 0 or 1 bits; the result is a correct prefix code, with code lengths specified by the depth of the symbols.

    Ideally, you would use the structure of the splay tree directly, to determine the Huffman code for each symbol. However, splay trees maintain their data in the nodes, not the leaves. You will either need to find some way to use a splay tree based on data in the leaves, or come up with a transformation that computes a valid (and efficient) set of prefix codes from node locations instead.

    One possibility is to maintain the leftmost and rightmost leaf of each subtree in its root node (to be updated as the tree is splayed, of course). This should allow you to search for leaves, even though you don't actually care about your node data as such. Conventional splaying operations should then naturally generate a dynamic Huffman code biased towards recently occurring symbols.