Search code examples
javaarraysfilehuffman-codecompression

Java - My Huffman decompression refuses to decompress non-text files (returns empty file)


I am able to compress all kinds of files (.jpg, .mp4 etc.) but when I try to decompress these non-text files the program just returns an empty decompressed file... the weird part is that I am able to decompress plain text files just fine.

When I compress my original file I put both the data needed to reconstruct the tree and the encoded bits in the same file. The format looks something like this:

<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]

Where n is the total number of unique bytes (AKA number of leafs in my tree), value is just my leaf values in byte form and frequency is the frequency of each byte/"character" (frequency is an int value so it consists of 4 bytes per frequency).

BitFileReader and BitFileWriter in my code are just wrapper classes for a BufferedOutStream/InputStream with the added functionality of reading/writing bit by bit.

I added my whole Huffman code below but the main focus is the compress() and decompress() methods at the bottom. At the very least I want to know if my logic for these methods is fine, and if so what is causing it to return empty decompressed files when decompressing other file types (that aren't plain text files)?

Huffman Code:

public class HuffmanCode {


    public static Tree buildTree(int[] charFreqs) {
        PriorityQueue<Tree> trees = new PriorityQueue<Tree>();

        for (int i = 0; i < charFreqs.length; i++){
            if (charFreqs[i] > 0){
                trees.offer(new Leaf(charFreqs[i], i));
            }
        }

        //assert trees.size() > 0;

        while (trees.size() > 1) {
            Tree a = trees.poll();
            Tree b = trees.poll();

            trees.offer(new Node(a, b));
        }
        return trees.poll();
    }

    public static void printStruct(Tree tree) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + " " + leaf.frequency);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            printStruct(node.left);

            // traverse right
            printStruct(node.right);
        }
    }


    public static void printStruct(Tree tree, StringBuffer prefix) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            printStruct(node.left, prefix);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            printStruct(node.right, prefix);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(leaf.value, prefix.toString());

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillEncodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillEncodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(prefix.toString(), leaf.value);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillDecodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillDecodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }



    public static void compress(File file){
        try {
            Path path = Paths.get(file.getAbsolutePath());
            byte[] content = Files.readAllBytes(path);
            TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
            File nF = new File(file.getName()+"_comp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);

            int[] charFreqs = new int[256];

            // read each byte and record the frequencies
            for (byte b : content){
                charFreqs[b&0xFF]++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillEncodeMap(tree, new StringBuffer(), encodeMap);

            // Writes tree structure in binary form to nF (new file)
            bfw.writeByte(encodeMap.size());
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] != 0){
                    ByteBuffer b = ByteBuffer.allocate(4);
                    b.putInt(charFreqs[i]);
                    byte[] result = b.array();

                    bfw.writeByte(i);
                    for(int j=0; j<4;j++){
                        bfw.writeByte(result[j]&0xFF);
                    }
                }
            }

            // Write compressed data
            for(byte b : content){
                String code = encodeMap.get(b&0xFF);
                for(char c : code.toCharArray()){
                    if(c == '1'){
                        bfw.write(1);
                    }
                    else{
                        bfw.write(0);
                    }
                }
            }
            bfw.close();
            System.out.println("Compression successful!");

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void decompress(File file){
        try {
            BitFileReader bfr = new BitFileReader(file);
            int[] charFreqs = new int[256];
            TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
            File nF = new File(file.getName()+"_decomp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);
            DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));

            int uniqueBytes;
            int counter = 0;
            int byteCount = 0;
            uniqueBytes = data.readUnsignedByte();

            // Read frequency table
            while (counter < uniqueBytes){
              int index = data.readUnsignedByte();
              int freq = data.readInt();
              charFreqs[index] = freq;
              counter++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillDecodeMap(tree, new StringBuffer(), decodeMap);

            // Skip BitFileReader position to actual compressed code
            bfr.skip(uniqueBytes*5);

            // Get total number of compressed bytes
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] > 0){
                    byteCount += charFreqs[i];
                }
            }

            // Decompress data and write
            counter = 0;
            StringBuffer code = new StringBuffer();

            while(bfr.hasNextBit() && counter < byteCount){
                code.append(""+bfr.nextBit());

                if(decodeMap.containsKey(code.toString())){
                    bfw.writeByte(decodeMap.get(code.toString()));
                    code.setLength(0);
                    counter++;
                }
            }
            bfw.close();
            bfr.close();
            data.close();

            System.out.println("Decompression successful!");

        } 

        catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        File f = new File("test");
        compress(f);
        f = new File("test_comp");
        decompress(f);
    }
}

EDIT: I found the cause but I don't know how to fix it or why it happens. The problem is that my charFreqs[] array in my decompression method never gets filled (all its values are still at zero AKA all bytes have zero frequency according to the array).


Solution

  • I solved it! The problem was the bfw.writeByte(encodeMap.size()) line in my compress() method. It would only write bytes to the file but the encodeMap.size() function can return a value of 256 if it is full. 256 is a higher value than a byte can hold (bfw.writeByte() actually takes an int as a argument but it only writes the 8 lowest bits of the int, essentially only the bits that a byte can hold, so in a way it actually has the range of an unsigned byte 0-255).

    I solved this by changing two lines of code. The line bfw.writeByte(encodeMap.size()) in my compress() method was changed to bfw.writeByte(encodeMap.size()-1) and the line uniqueBytes = data.readUnsignedByte() in my decompress() method was changed to uniqueBytes = data.readUnsignedByte() + 1