Search code examples
cfreemallochashtablevoid-pointers

A few questions about my modular code using void* as dynamic data type in C


A few days ago I posted this question and everyone suggested me to use void*, which I did. I think some of them also pointed a few things that I would need to take care of but I'm not sure what exactly were they. However, I'm having a few problems with this...

I'm not going to post all my code where cause it's quite large, instead, I'll post the things that I think are important and hopefully are enough for you to help me out.

My hash table structure is like this:

typedef void * HashKey;
typedef void * HashValue;

typedef struct sHashItem {
    HashKey key;
    HashValue value;

    char status;
} HashItem;

typedef struct sHashTable {
    HashItem *items;

    int count;
    float load;
    int size;

    Bool (*compare)(HashKey, HashKey);
    unsigned (*hash)(void *);
} HashTable;

The signature for my insert function goes like this:

Bool hashInsert(HashTable * const table, HashKey key, HashValue value);

And somewhere inside that function, when I find a free bucket in the hash table, I do this:

table->items[index].key = key;
table->items[index].value = value;
table->items[index].status = USED;
table->load = ++table->count / (float)table->size;

All this presents a few problems:

1) As you can see above I'm simply setting each key/value pair of the free bucket to the same pointer passed as the key/value hashInsert function arguments. This presents a problem as you may have already noticed... For instance, doing something like this:

char str[50];
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)5);
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)3);

And if the input is "KeyA" and then "KeyB", both will have "KeyB" as the buckets keys. The same thing applies to the value and not just the key since they are basically the same type because I want to have my code fully modular, for any data type.

How could I solve this?

My first though is to use strdup(str) and pass that to the hashInsert function. That would solve the problem. And since this was handled in the main code, I could easily use malloc() too for any other data type I need to pass as the value (the key will probably always be a string or an int).

But this solution presents another problem...

2) How should I free this allocated memory? Sure, it was allocated by the "main programmer" and not the "hash table module programmer" so, the "main programmer" should free it in the main code, right? However, that doesn't look much like modular code to me.

My code also has a hashDestroy function, to free all the allocated memory. But how can I use this function to free everything? I can't just iterate over every key/value and use free() on them cause maybe some of them weren't malloc'd by any programmer in the first place and I don't need to free them.

How can I find out which ones my hashDestroy must free and which ones it shouldn't?

3) To finish, I guess I can also throw this issue into the mix... In point one, my suggestion was to use strdup() or malloc to "fix" that specific problem (while introducing another) but that also don't look very modular to me. This memory allocation should be done in the hash table module code and not in the main code by the "main programmer".

How do you suggest I solve this? I mean, the data types can be anything and while the use of strdup() helps a lot, it only works for strings. What if I need to allocate memory for some specific structure or just an int?

Sorry for the big post but I think these questions are all related and I need some help figuring them out since my C knowledge is not that extreme. I only recently learned about void* so...


Solution

  • Wow: this is going to take some answering in full. However, one of the key things you're going to need is the size of whatever it is you are processing - it is fine to use a void pointer, but you need to know how big the object whose address you're receiving is.

    [...] everyone suggested me to use void*, which I did. [...]

    My hash table structure is like this:

    typedef void * HashKey;
    typedef void * HashValue;
    
    typedef struct sHashItem {
        HashKey key;
        HashValue value;
        char status;
    } HashItem;
    
    typedef struct sHashTable {
        HashItem *items;
        int count;
        float load;
        int size;
    
        Bool (*compare)(HashKey, HashKey);
        unsigned (*hash)(void *);
    } HashTable;
    

    You are likely to need a size_t key_sz; and a size_t val_sz; member in HashItem. Your hash function pointer will need to know how big the key to be hashed is.

    I'm in two minds about what the HashKey should be. It depends in part on how you are using this stuff. It looks like you want:

    • Given this key value of my choosing,
    • Store/return this data that is associated with it.

    In which case, you probably also need to store the hash number somewhere in the HashItem; that is the value returned by your hashing function - apparently an unsigned integer. I'm not sure what the signature on the compare function (function pointer) should be; I'm suspicious that it should either take a pair of HashKey-and-size values, or possibly a pair of HashItem pointers.

    The signature for my insert function goes like this:

    Bool hashInsert(HashTable * const table, HashKey key, HashValue value);
    

    And somewhere inside that function, when I find a free bucket in the hash table, I do this:

    table->items[index].key = key;
    table->items[index].value = value;
    table->items[index].status = USED;
    table->load = ++table->count / (float)table->size;
    

    All this presents a few problems:

    1) As you can see above I'm simply setting each key/value pair of the free bucket to the same pointer passed as the key/value hashInsert function arguments. This presents a problem as you may have already noticed... For instance, doing something like this:

    char str[50];
    scanf("%s%*c", str);
    hashInsert(t1, (HashKey)str, (HashValue)5);
    scanf("%s%*c", str);
    hashInsert(t1, (HashKey)str, (HashValue)3);
    

    The key to using void * is to pass addresses around. Casting should be unnecessary in C. You also need to pass the size of things. Hence:

    Bool hashInsert(HashTable * const table, HashKey key, size_t key_sz,
                    HashValue value, size_t val_sz);
    
    char str[50];
    scanf("%s%*c", str);
    int value = 5;
    hashInsert(t1, str, strlen(str)+1, &value, sizeof(value));
    

    Internally, you will copy the data - not using 'strdup()' since you don't know that there aren't interior NUL '\0' bytes in it.

    And if the input is "KeyA" and then "KeyB", both will have "KeyB" as the buckets keys. The same thing applies to the value and not just the key since they are basically the same type because I want to have my code fully modular, for any data type.

    How could I solve this?

    You have to define who owns what, and whether (and how) the container copies data. In C++, the containers make a copy of whatever they are storing.

    My first thought is to use strdup(str) and pass that to the hashInsert function. That would solve the problem. And since this was handled in the main code, I could easily use malloc() too for any other data type I need to pass as the value (the key will probably always be a string or an int).

    You can't use 'strdup()' because in general, neither the values nor the keys are strings. If they are always strings, why are you using 'void *' instead of 'char *'?

    You can decide to copy the value and the key - as long as you know the sizes.

    But this solution presents another problem...

    2) How should I free this allocated memory? Sure, it was allocated by the "main programmer" and not the "hash table module programmer" so, the "main programmer" should free it in the main code, right? However, that doesn't look much like modular code to me.

    My code also has a hashDestroy function, to free all the allocated memory. But how can I use this function to free everything? I can't just iterate over every key/value and use free() on them cause maybe some of them weren't malloc'd by any programmer in the first place and I don't need to free them.

    How can I find out which ones my hashDestroy must free and which ones it shouldn't?

    You can't. You have to define a policy and only if that policy allows you to do the destruction should you do it. If you copy everything, you have an easy time. If you copy nothing, you have a different easy time (arguably easier), but your consumers have a hell of a time because they need a structure to keep track of what they need to release - perhaps a hashed list...

    3) To finish, I guess I can also throw this issue into the mix... In point one, my suggestion was to use strdup() or malloc to "fix" that specific problem (while introducing another) but that also don't look very modular to me. This memory allocation should be done in the hash table module code and not in the main code by the "main programmer".

    Yes...that's basically my recommendation.

    How do you suggest I solve this? I mean, the data types can be anything and while the use of strdup() helps a lot, it only works for strings. What if I need to allocate memory for some specific structure or just an int?

    Note that copying only does shallow copies. If the structures you're copying contain pointers, then the duplication code will only copy the pointer and not the pointed at data.

    So, a general solution will require some sort of copy function. You may have to demand that the user gives you a 'release' function that frees the memory in an item. You may need to have the user provide you with already fully allocated data. You need to think about who owns what the search function returns - is it still 'in' the hash table or has it been removed. Look hard at the C++ STL system - it generally speaking does an excellent job and modelling your requirements on what it requires may make sense. But remember, C++ has constructors and destructors to help it.