Search code examples
javafiledata-structurespriority-queuehuffman-code

Converting string to bit and write it in a file in Java


I have written a program about Huffman compression and at the end of the day I can manipulate the string read and change it specific to Huffman. For example, I could get the "1001010000010101111011" string for the "abcdeffg" string entered. My problem is this: I have to write this string I got to a file and save it. I also know file operations, but I don't know how to write this string in bits. The string "abcdeffg" that I read from the file now is 8 bytes, while the string "1001010000010101111011" I created is 22 bytes. So if I write this to the file, I will not be compressing. How can I write this string of 0's and 1's to the file for Huffman Compression to work? The code I have written so far is following;

Class HuffmanEncodedResult:

class HuffmanEncodedResult {
    final Node root;
    final String encodedData;

    public HuffmanEncodedResult(final String encodedData, final Node root) {
        this.encodedData = encodedData;
        this.root = root;
    }

    public Node getRoot() {
        return this.root;
    }

    public String getEncodedData() {
        return this.encodedData;
    }
}

Class HuffmanEncoder:

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

public class HuffmanEncoder {

    private static final int LETTER_SIZE = 256;

    public HuffmanEncodedResult compress(final String data){
        final int[] freqTable = buildFreqTable(data);
        final Node root = buildHuffmanTree(freqTable);
        final Map<Character,String> lookupTable = buildLookupTable(root);

        return new HuffmanEncodedResult(generateEncodedData(data, lookupTable), root);
    }

    private static String generateEncodedData(String data, Map<Character, String> lookupTable) {
        final StringBuilder builder = new StringBuilder();
        for(final char character : data.toCharArray()){
            builder.append(lookupTable.get(character));
        }
        return builder.toString();
    }

    private static Map<Character,String> buildLookupTable(final Node root){
        final Map<Character,String> lookupTable = new HashMap<>();
        buildLookupTableImpl(root,"",lookupTable);

        return lookupTable;
    }

    private static void buildLookupTableImpl(Node node, String s, Map<Character, String> lookupTable) {
        if(!node.isLeaf()){
            buildLookupTableImpl(node.leftChild,s + '0', lookupTable);
            buildLookupTableImpl(node.rightChild,s + '1', lookupTable);
        }else{
            lookupTable.put(node.character,s);
        }
    }

    private static Node buildHuffmanTree(int[] freq){
        final PriorityQueue<Node> pq = new PriorityQueue<>();
        for(char i = 0; i<LETTER_SIZE; i++){
            if(freq[i] > 0){
                pq.add(new Node(i,freq[i],null,null));
            }
        }

        if(pq.size() == 1){
            pq.add(new Node('\0',0, null, null));
        }

        while (pq.size() > 1){
            final Node left = pq.poll();
            final Node right = pq.poll();
            assert right != null;
            assert left != null;
            final Node parent = new Node('\0',left.frequency + right.frequency, left, right);
            pq.add(parent);
        }
        return pq.poll();
    }

    private static int[] buildFreqTable(final String data){
        final int[] freq = new int[LETTER_SIZE];
        for(final char character :data.toCharArray()){
            freq[character]++;
        }

        return freq;
    }

    public String decompress(HuffmanEncodedResult result){

        final StringBuilder resultBuilder = new StringBuilder();
        Node current = result.getRoot();
        int i = 0;

        while(i<result.getEncodedData().length()){
            while(!current.isLeaf()){
                char bit = result.getEncodedData().charAt(i);
                if(bit == '1'){
                    current = current.rightChild;
                }else if(bit == '0'){
                    current = current.leftChild;
                }else{
                    throw new IllegalArgumentException("Invalid Bit in Message! " + bit);
                }
                i++;
            }
            resultBuilder.append(current.character);
            current = result.getRoot();
        }

        return resultBuilder.toString();
    }

    private byte[] toByteArray(String input){
        //to charArray
        char[] preBitChars = input.toCharArray();
        int bitShortage = (8 - (preBitChars.length%8));
        char[] bitChars = new char[preBitChars.length + bitShortage];
        System.arraycopy(preBitChars, 0, bitChars, 0, preBitChars.length);

        for (int  i= 0;  i < bitShortage;  i++) {
            bitChars[preBitChars.length + i]='0';
        }

        //to bytearray
        byte[] byteArray = new byte[bitChars.length/8];
        for(int i=0; i<bitChars.length; i++) {
            if (bitChars[i]=='1'){
                byteArray[byteArray.length - (i/8) - 1] |= 1<<(i%8);
            }
        }
        return byteArray;
    }

    public static void main(String[] args) {
        final String test = "abcdeffg";
        final HuffmanEncoder encoder = new HuffmanEncoder();
        final HuffmanEncodedResult result = encoder.compress(test);
        System.out.println("Encoded Message: " + result.encodedData);
        System.out.println("Unencoded Message: " + encoder.decompress(result));
    }
}

Class Node:

public class Node implements Comparable<Node>{
    protected final char character;
    protected final int frequency;
    protected final Node rightChild;
    protected final Node leftChild;

    public Node(char character, int frequency, Node leftChild, Node rightChild) {
        this.character = character;
        this.frequency = frequency;
        this.rightChild = rightChild;
        this.leftChild = leftChild;
    }

    public boolean isLeaf(){
        return this.leftChild == null && this.rightChild == null;
    }

    @Override
    public int compareTo(Node o) {
        final int frequencyComparison = Integer.compare(this.frequency,o.frequency);
        if(frequencyComparison != 0){
            return frequencyComparison;
        }
        return Integer.compare(this.character,o.character);
    }
}

Console Output:

"C:\Program Files\Java\jdk-15.0.1\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\lib\idea_rt.jar=62965:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\ayber\Desktop\untitled\out\production\untitled HuffmanEncoder
Encoded Message: 1001010000010101111011
Unencoded Message: abcdeffg

Process finished with exit code 0

Solution

  • You need to write the data as a byte array using FileOutputStream.write and with the byte array as a binary representation of your string of ones and zeros.

    One way of converting your string to bits is to use BitSet.

    To create a BitSet of the correct size use:

    BitSet b = new BitSet(string.length());
    

    Then you can loop over the string and set the bits:

    for (int i = 0; i < string.length(); i++) {
        if (string.charAt(i) == '1') {
            b.set(i);
        }
    }
    

    Then you can convert the bitset to a byte array and write it to a file:

    byte[] bytes = b.toByteArray();
    FileOutputStream f = new FileOutputStream(filename);
    f.write(bytes);
    

    However, I would note that creating a string and then converting it to a bitset and then converting it to a byte array is not very efficient. It might be better to work with a bitset from the beginning.