Search code examples
chashhashtable

Why does my hash table program keep crashing?


I'm trying to create a program that reads a dictionary and then stores the words into the hash table, then read another file checks every word of that file if it is in the hash table if it is not then it will be outputted as a misspelled word. I'm first trying to check if I can load the dictionary file into my hash table and then output the words in the hash table yet my code seems to crash whenever I try to run it. The hash function I use was taken from the Internet. I'm also still very new with data structures, and having a hard time understanding.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;

typedef struct node
{
    char* word;
    struct node *next;
}
node;

node *table[10];

// hash function
unsigned int hash(char *word)
{
// TODO
    unsigned int hash = 5381;
    int c = 0;

    while (c == *word++)
        hash = ((hash << 5) + hash) + c;

    return hash % 10;
}

int main(void)
{
    // initialize array heads to NULL
    for (int i = 0; i < N; i++)
    {
        table[i] = NULL;
    }

    // Open file to read
    FILE *indata = fopen(dictionary, "r");   
    if (indata == NULL)
    {
        printf("cant open\n");
        return 1;
    }

    // variable to store words read from the file
    char *words = malloc(sizeof(char) * 20);
    if (words == NULL)
    {
        printf("no memory\n");
        return 1;
    }

    // While loop to read through the file
    while (fgets(words, 20, indata))
    {
        // get the index of the word using hash function
        int index = hash(words);

        // create new node
        node *newNode = malloc(sizeof(node));
        if (newNode == NULL)
        {
            printf("here\n");
            return 1;
        }

        // make the new node the new head of the list
        strcpy(newNode->word, words);
        newNode->next = table[index];
        table[index] = newNode;

        // free memory
        free(newNode);
    }
    // free memory
    free(words);
    // loop to print out the values of the hash table
    for (int i = 0; i < N; i++)
    {
        node *tmp = table[i];
        while (tmp->next != NULL)
        {
            printf("%s\n", tmp->word);
            tmp = tmp->next;
        }
    }

    // loop to free all memory of the hash table
    for (int i = 0; i < N; i++)
    {
        if (table[i] != NULL)
        {
            node *tmp = table[i]->next;
            free(table[i]);
            table[i] = tmp;
        }
    }

    // close the file
    fclose(indata);
}

Solution

  • At least three bugs that independently caused a segfault:

    First, newNode->word is used unitialized, so it points to random memory, so the strcpy would segfault. Better to use strdup

    Also, after you put newNode in the table, you do free(newNode) making what it points to invalid. This causes the second loop to segfault

    Third, in the second loop, if table[i] is null, the while (tmp->next != NULL) will segfault

    I've annotated and corrected your code:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    // file to read
    #define dictionary "dictionary.txt"
    
    // No. of buckets
    const unsigned int N = 10;
    
    typedef struct node {
        char *word;
        struct node *next;
    } node;
    
    node *table[10];
    
    // hash function
    unsigned int
    hash(char *word)
    {
    // TODO
        unsigned int hash = 5381;
        int c = 0;
    
        while (c == *word++)
            hash = ((hash << 5) + hash) + c;
    
    // NOTE: not a bug but probably better
    #if 0
        return hash % 10;
    #else
        return hash % N;
    #endif
    }
    
    int
    main(void)
    {
        // initialize array heads to NULL
        for (int i = 0; i < N; i++) {
            table[i] = NULL;
        }
    
        // Open file to read
        FILE *indata = fopen(dictionary, "r");
    
        if (indata == NULL) {
            printf("cant open\n");
            return 1;
        }
    
        // variable to store words read from the file
        char *words = malloc(sizeof(char) * 20);
    
        if (words == NULL) {
            printf("no memory\n");
            return 1;
        }
    
        // While loop to read through the file
        while (fgets(words, 20, indata)) {
            // get the index of the word using hash function
            int index = hash(words);
    
            // create new node
            node *newNode = malloc(sizeof(node));
    
            if (newNode == NULL) {
                printf("here\n");
                return 1;
            }
    
            // make the new node the new head of the list
    // NOTE/BUG: word is never set to anything valid -- possible segfault here
    #if 0
            strcpy(newNode->word, words);
    #else
            newNode->word = strdup(words);
    #endif
            newNode->next = table[index];
            table[index] = newNode;
    
            // free memory
    // NOTE/BUG: this will cause the _next_ loop to segfault -- don't deallocate
    // the node you just added to the table
    #if 0
            free(newNode);
    #endif
        }
    
        // free memory
        free(words);
    
        // loop to print out the values of the hash table
        for (int i = 0; i < N; i++) {
            node *tmp = table[i];
    // NOTE/BUG: this test fails if the tmp is originally NULL (i.e. no entries
    // in the given hash index)
    #if 0
            while (tmp->next != NULL) {
    #else
            while (tmp != NULL) {
    #endif
                printf("%s\n", tmp->word);
                tmp = tmp->next;
            }
        }
    
        // loop to free all memory of the hash table
        for (int i = 0; i < N; i++) {
            if (table[i] != NULL) {
                node *tmp = table[i]->next;
    
                free(table[i]);
                table[i] = tmp;
            }
        }
    
        // close the file
        fclose(indata);
    }
    

    UPDATE:

    I made a linked list program before that stores an integer in the list, int number; struct node *next; and I used newNode->number = 5; and it worked, why is it in this case it doesn't?? Is it because I am working with strings here??

    The difference is that word is a pointer. It must be assigned a value before it can be used. strcpy does not assign a value to word. It tries to use the contents of word as the destination address of the copy.

    But, the other two bugs happen regardless of word being a char * vs number being int.

    If you had defined word not as a pointer, but as a fixed array [not as good in this usage], the strcpy would have worked. That is, instead of char *word;, if you had done (e.g.) char word[5];

    But, what you did is better [with the strdup change] unless you can guarantee that the length of word can hold the input. strdup will guarantee that.

    But, notice that I [deliberately] made word have only five chars to illustrate the problem. It means that the word to be added can only be 4 characters in length [we need one extra byte for the nul terminator character]. You'd need to use strncpy instead of strcpy but strncpy has issues [it does not guarantee to add the nul char at the end if the source length is too large].

    Conincidentally, there is another question today that has an answer that may help shed some more light on the differences of your word struct member: Difference between memory allocations of struct member (pointer vs. array) in C