Search code examples
cstringcompressionlzw

Array of variable length strings for lzw compression


Right here is the function itself. I'm having a segfault in there because apparently I'm unable to assign the string to that value in the array. clang/gcc both give me a warning. Clang's is a bit better which is "expecting char assigning char *". I don't know of any other way to work with that dictionary, as everything I've tried isn't working. I'm also going to include all helper functions for it, but I'm pretty sure it's in this function itself.

As per usual, I'll upvote any answer that works, and I'll accept the one that I personally chose. Anyway I'm going to post the rest of the "helper" functions below.

void lzw_compress(char *uncompressed_data,size_t uncompressed_length,char *out){
unsigned long i=0,j=0;
char *character=malloc(1);
char *word=malloc(65535);
char *word_character=malloc(65535);
unsigned long word_size=0;
long *tmp_buffer=malloc(65535);
char *dictionary=malloc(130000);
for(i=0;i<=255;++i){
    dictionary[i]=base_dictionary[i];
}
long index=0;  
unsigned long dictionary_size=256;
for(i=0;i<uncompressed_length;++i){
    character[0]=(unsigned char )uncompressed_data[i];
    //arrcat(word_character,word,word_size,character);
    for(j=0;j<word_size;++j){
        word_character[j]=word[j];
    }
    word_character[j]=*character;
    index=search(dictionary,dictionary_size,word_character);
    if(index!=-1){
        for(j=0;j<(word_size+1);++j){
            word[j]=word_character[j];
        }
        ++word_size;
    }
    else{
        tmp_buffer[j++]=index;
        ++dictionary_size;
        //dictionary[dictionary_size++]=(unsigned long *)word_character;

            dictionary[dictionary_size]=*word_character;

        word=*character;
        word_size=1;
    }
}
if(memcmp(word,"",1)!=0){
    tmp_buffer[j++]=search(dictionary,dictionary_size,word);
}
    char *debug="";
    for(i=0;i<j;++i){
        sprintf(debug,"%s%lu,",debug,tmp_buffer[i]);
    }
    printf("%s",debug);

}

long search(char *table,unsigned long table_length,char
*search_value){
    unsigned long i=0;
    for(i=0;i<table_length;++i){
        if(table[i]==*search_value){
            return i;
        }
    }
    return -1; 
 }

So as you can see I'm trying to do an lzw-like program in pure c. I always compile with -Wall -std=c99(because I occasionally use p99.h for preprocessor macro abuse). But for some reason I'm unable to get my array of strings to work, I know I've used code similar to it(but apparently I didn't back it up...) but anyway yeah. I'm unable to figure out how I'm supposed to be doing it(properly). I'll greatly appreciate anyone's help with this issue.

As per normal whatever code I post on here, is public domain unless otherwise stated, and once I get the entire thing working I post it here so that anyone else looking for it can get it working too.

Finally thanks for reading this thread, and for heping me(if you know how to). Once I get back after going to town(if there's already an answer), I'll check it/mark off things then. But don't let that discourage you, because yours may be a better solution than the one that I chose and you'll still get an upvote.

Edit 1:Edited the code to what it was previously(according to git).

Edit 2: Fixed up a lot of things, got it looking better. Still the array comparison function isn't working(for some odd reason).


Solution

  • Now that you have the allocations, there are a few points that can be determined as errors from that:

    for(i=0;i<uncompressed_length;++i){
        character[0]=(unsigned char )uncompressed_data[i];
        //arrcat(word_character,word,word_size,character);
        for(j=0;j<word_size;++j){
            word_character[j]=word[j];
        }
    

    Initially, the memory word points to is uninitialised, and word_size is 1. So you copy the indeterminate char word[0] to word_character[0]. I'm not sure whether you should set word_size = 0 initially, or move that copying loop, or something else.

    word_character[j]=character;
    

    You're assigning a char* to a char. You probably meant word_character[j] = *character; there (or character[0] instead of *character, which is equivalent).

      dictionary[dictionary_size]=word_character;
    

    Again assigning a char* to a char. I can't guess what you wanted here, since dictionary_size is not changed in the loop. Maybe you wanted to increment dictionary_size and copy the word_character string?

        word=character;
        word_size=1;
    

    Here you're losing the handle to the memory that was allocated to word initially - commonly known as a memory leak - and let word point to a memory block that has enough space for one character. You probably meant to copy the pointed-to character,

    word[0] = character[0];
    

    there?


    Initial answer for original code:

    void lzw_compress(char *uncompressed_data,size_t uncompressed_length,char *out){
    unsigned long i=0,j=0;
    char *character;
    char *word;
    char *word_character;
    unsigned long word_size=1;
    long *tmp_buffer=malloc(65535);
    char *dictionary;
    for(i=0;i<=255;++i){
        dictionary[i]=base_dictionary[i];
    }
    

    You haven't allocated any memory for dictionary to point to, this is undefined behaviour with a non-zero segfault probability.

    long index=0;  
    unsigned long dictionary_size=256;
    for(i=0;i<uncompressed_length;++i){
        character[0]=(unsigned char )uncompressed_data[i];
    

    You haven't allocated memory for character either, again undefined behaviour.

        //arrcat(word_character,word,word_size,character);
        for(j=0;j<word_size;++j){
            word_character[j]=word[j];
        }
    

    word_character and word don't point to allocated memory either, more undefined behaviour.

        word[j]=(unsigned long)character;
    

    You're casting a char* to unsigned long and assign that value to a (non-allocated) char. Even if word[j] was valid memory, what's the intention here?

        index=search(dictionary,dictionary_size,word_character);
        if(index!=-1){
            for(j=0;j<(word_size+1);++j){
                word[j]=word_character[j];
            }
            ++word_size;
        }
        else{
            tmp_buffer[j++]=index;
            ++dictionary_size;
            //dictionary[dictionary_size++]=(unsigned long *)word_character;
          for(j=0;j<word_size;++j){
                dictionary[dictionary_size]=word_character;
           }
            word=character;
            word_size=1;
        }
    }
    if(memcmp(word,"",sizeof word)!=0){
    

    sizeof word is the size of a char*. You probably intended to use the length of the string here.

        tmp_buffer[j++]=search(dictionary,dictionary_size,word);
    }
        char *debug="";
        for(i=0;i<j;++i){
            sprintf(debug,"%s%lu,",debug,tmp_buffer[i]);
    

    calling sprintf with overlapping source and destination is undefined behaviour. In this case, it is even a string literal. String literals aren't modifiable, so that's another source for undefined behaviour, and a likely crash due to trying to modify a string literal.

        }
        printf("%s",debug);
    
    }