Search code examples
cdatabasegeneric-programmingvoid-pointers

Generic Lookup Function in C database


So I am trying to create a database in C using hash tables(where my hastable is just an array of LinkedLists with void pointers). I am also trying to make it "generic" using void pointers but I don't know how to do lookup and delete with void pointers. For example I have a tuple named CSG (Course StudentID Grade) defined as

typedef struct CSG {
  char Course[6];
  int StudentId;
  char Grade[2];
  int (*hash)(int StudentId);
}CSG;

where the function pointer just points to a simple hash function I wrote.

lookup function for CSG currently is

LinkedList*  lookup_CSG(hashtable* db,int StudentId, char* Grade, char* Course){
    LinkedList* ret=new_LinkedList();
    if(StudentId!=0){
        ListNode* curr=db->table[int_hash(StudentId)]->head;
        while(curr!=NULL){
            CSG* currentCSG=(CSG*)curr->data;
            if(currentCSG->StudentId==StudentId){
                append(ret, currentCSG);
                return ret;
            }
            else{
                curr=curr->next;
            }
        }
    }
    else if(Grade[0]!='*'&&Course[0]!='*'){
        for(int i=0;i<1009;i++){
            ListNode* curr=db->table[i]->head;
            if(curr==NULL){
                continue;
            }
            while(curr!=NULL){
                if(curr==NULL){
                    break;
                }
                CSG* currentCSG=(CSG*)curr->data;
                if(currentCSG==NULL){
                    break;
                }
                if(strcmp(Grade, currentCSG->Grade)==0&&strcmp(Course, currentCSG->Course)==0){
                    append(ret, currentCSG);
                }
                curr=curr->next;
            }
        }
    }
    else if(Grade[0]!='*'){
        for(int i=0;i<1009;i++){
            ListNode* curr=db->table[i]->head;
            if(curr==NULL){
                continue;
            }
            while(curr!=NULL){
                if(curr==NULL){
                    break;
                }
                CSG* currentCSG=(CSG*)curr->data;
                if(currentCSG==NULL){
                    break;
                }
                if(strcmp(Grade, currentCSG->Grade)==0){
                    append(ret, currentCSG);
                }
                curr=curr->next;
            }
        }
    }
    else if(Course[0]!='*'){
        for(int i=0;i<1009;i++){
            ListNode* curr=db->table[i]->head;
            if(curr==NULL){
                continue;
            }
            while(curr!=NULL){
                if(curr==NULL){
                    break;
                }
                CSG* currentCSG=(CSG*)curr->data;
                if(currentCSG==NULL){
                    break;
                }
                if(strcmp(Course, currentCSG->Course)==0){
                    append(ret, currentCSG);
                }
                curr=curr->next;
            }
        }
    }
    return ret;
}

Where the " * " or 0 char if for omitting a certain parameter ie if you call lookup_CSG(db,0 "A-","*") will return a linked list of all the tuples where the grade is A-.

Which works fine but with this model I will have to write a specific lookup and delete function for every tuple. Does anyone know if there is a way that I can create a "generic" lookup function. I am running into trouble when I try this because each tuple has different attributes so the comparisons will be different as will the castings. The lookup function will also take different parameters depending on the tuple. Thanks!


Solution

  • There are two ways to approach this:

    1. Use void* for all value types and parameters, or
    2. Write a bunch of macros which will generate strongly-types hashtables for all input types (like what's done in klib).

    For the first approach, you will need two function pointers accepting void*, which can be typedef'd using:

    // returns a 32-bit hash value for the specified key
    typedef int32_t (*HashGetterFn)(const void*);
    
    // return 'true' if two parameters are equal
    typedef bool (*EqualityFn)(const void*, const void*);
    
    // you can also place these in a struct
    typedef struct 
    {
        HashGetterFn getHash;
        EqualityFn equals;
    }
    HashFuncs;
    

    This allows you to use the same function signature for any key you like, although you will have to pass it by reference in your functions.

    So, if your key is an int, you would write these two functions:

    // a hash implementation for int keys
    static int32_t int_hash(const void * input)
    {
         // cast the void pointer to the actual key pointer
         const int * value = (const int *)input;
    
         // dereference it and calculate the hash
         return (int32_t)*value;
    }
    
    static bool int_equals(const void * a, const void * b)
    {
         // cast the void pointers to actual key pointers
         const int * aval = (const int *)a;
         const int * bcal = (const int *)b;
    
         // dereference and compare
         return *aval == *bval;
    }
    
    // and then you use it somewhere
    void some_function(int *item)
    {
        // wrap these two functions
        HashFuncs hashfn = { .getHash = int_hash, .equals = int_equals };
    
        // create a hashtable which will use them 
        HashTable hashtable = hashtable_create(hashfn);
    
        // add a key/value pair
        hashtable_add(&hashtable, (void*)item);
    }
    

    You might have noticed two issues already:

    1. Your values need to have static duration or be allocated on the heap, because they are passed by reference. Alternatively, you also need to pass the size of the struct to the hash table and have all its functions memcpy the value of each struct instance into the internal table.

    2. Nothing prevents you from passing a completely wrong value to your hash table functions, since they accept void pointers to allow them to work with any type. This means that functions which compile without issues might fail in runtime, when one of your hash funcions dereferences a wrong pointer type.

    To mitigate this, the approach taken by klib is to use the preprocessor to generate a list of strongly typed functions and structs for each specific key/value pair you want to use. This approach is also not perfect; it's a pain to write and test there numerous multi-line macros, but since this library has already been written and you might as well just reuse it.

    This is basically one of the issues that generic programming fixes elegantly in many modern languages (templates in C++, generics in C# or Java).