Search code examples
ccompressionhuffman-code

How to decompress from Huffman's compression in C


I am developing a program to decompress a file passed as a parameter and previously compressed via the Huffman algorithm, but my decompression function does not work, can you help me?

Here is the encryption format:

110;;1100;o1101000; 1101001;f1101010;r1101011;
1101100;�1101101;�1101110;{1101111;�1110000;�1110001;�1110010;�1110011;@1110100;61110101;m11101100;h11101101;l11101110;e11101111;01111;
���;o�;�����E}�j�U����͛wo�Ǘ>�@6

I have a function to read the file, and another to parse the huffman code (at the top of the file)

My decompression function :

unsigned char *read_bits_from_compressed(unsigned char *str, list_t *code) {
    int str_len = strlen(str);
    unsigned char padding = str[str_len - 1];
    int padding_bits = padding >> 4;
    int bits_read = 0;
    int curr_byte = 0;
    int curr_bit = 7;
    int bit = 0;
    int i = 0;
    int j = 0;
    node_t *node = code->head;
    cipher_t *cipher = NULL;
    unsigned char *result = malloc(str_len);
    memset(result, 0, str_len);
    for (i = str_len - 2; i >= 0; i--) {
        curr_byte = str[i];
        for (j = 7; j >= 0; j--) {
            bit = (curr_byte >> j) & 1;
            while (node != NULL && bits_read < padding_bits) {
                node = node->next;
                bits_read++;
            }
            while (node != NULL) {
                cipher = (cipher_t *) node->data;
                if (cipher->code[curr_bit] == bit) {
                    curr_bit--;
                    if (cipher->code[curr_bit + 1] == -1) {
                        result[str_len - padding - 1 - i] = cipher->c;
                        node = code->head;
                        curr_bit = 7;
                        break;
                    }
                } else {
                    node = node->next;
                    curr_bit = 7;
                }
            }
        }
    }
    return result;
}

The function must do the following:

  1. Invert and read the string from the end.
  2. The first character of the string is the padding we get it back
  3. Start reading bit by bit, ignoring the padding.
  4. Insert the entire bit representation into an array
  5. Read the array and retrieve the corresponding character
  6. Write the corresponding character to the output file
  7. Repeat until the end of compressed char (detect and skip Huffman's code)

Here are the structures of the chained list :

The list :

typedef struct list {
    node_t *head;
    node_t *tail;
    size_t size;
} list_t;

The nodes :

typedef struct node {
    struct node *prev;
    struct node *next;
    void *data;
} node_t;

The data contained in the nodes :

typedef struct cipher {
    unsigned char c;
    int *code;
} cipher_t;

c corresponds to char and code to Huffman code (composed of 1 and 0, terminated by -1).

My function currently returns an empty string.


Solution

  • You cannot compute the length of the compressed array with int str_len = strlen(str);. str points to binary data that may contain embedded null bytes that are meaningful. You should pass the length as an extra argument to read_bits_from_compressed.

    As a matter of fact, the compiler should have complained that you pass an unsigned char * to strlen() which expects a char * (or a const char *. Do not ignore compiler warnings.

    Furthermore, you allocate the decompressed string with unsigned char *result = malloc(str_len);. There is no guarantee that the length of the decompressed string be the same as that of the compressed buffer. It may be more or less, depending on the Huffmann tree and the uncompressed values. Note also that you must allocate ne extra byte for the null terminator if you intend to produce a C string.