Search code examples
csegmentation-faulthashtablecs50

CS50 Pset5, runs fine with large dictionary but seg faults on small dictionary


I am having some problems with PSET 5 of the CS50 course. This is the speller problem. My code Segfaults when I try to load the small dictionary.

After debugging with gdb, it shows that the error occurs when I'm trying to fopen(dictionary, "r") on line 83.

bool load(const char *dictionary)
{
    // fopen dictionary file (remb to check if NULL)
    FILE *dict = fopen(dictionary, "r");
    if (dict == NULL)
    {
        printf("Could not open file\n");
        return false;
    }

It returns "Could not open file\n"). But I can't figure out why this is happening, can anyone help? Everything for the large dictionary seems to be running fine.

Full code:

// Implements a dictionary's functionality
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <strings.h>
#include <ctype.h>
#include "dictionary.h"

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
}
node;

// Number of buckets in hash table
const unsigned int N = 256;

// Hash table
node *table[N];

// Word_count for size function
unsigned long word_count = 0;

// Returns true if word is in dictionary else false
bool check(const char *word)
{
    char small_word[LENGTH+1];
    for (int i = 0; i < LENGTH + 1; i++)
    {
        small_word[i] = 0;
    }
    
    for (int i = 0, n = strlen(word); i < n; i++)
    {
        if(isupper(word[i]))
        {
            small_word[i] = tolower(word[i]);
        }
        else
        {
            small_word[i] = word[i];
        }
    }
    // hash the word to get hash value
    unsigned long hvalue = hash(small_word);

    // point cursor to the first value in the table with index of hvalue
    node* cursor = table[hvalue];

    // as long as the strings don't match, we reassign cursor to the next value
    while (strcasecmp(small_word, cursor->word) != 0)
    {
        cursor = cursor->next;
        // since strings dont match, if they reach to the end where next is NULL
        if (cursor == NULL)
        {
            // then the word is misspelled
            return false;
        }
    }
    return true;
}

// Hashes word to a number
unsigned long hash(const char *word)
{
    // djb2 hash function as seen from http://www.cse.yorku.ca/~oz/hash.html
    unsigned long hash = 0;
    int c;

    while ((c = *word++))
    {
        hash = c + (hash << 6) + (hash < 16) - hash;
    }
    return (hash % N);
}

// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
    // fopen dictionary file (remb to check if NULL)
    FILE *dict = fopen(dictionary, "r");
    if (dict == NULL)
    {
        printf("Could not open file\n");
        return false;
    }

    char storage[LENGTH + 1];
    for (int i = 0; i < LENGTH + 1; i++)
    {
        storage[i] = 0;
    }

    while (fscanf(dict, "%s", storage) != EOF)
    {
        // allocate space for buffer node
        node *buffer = malloc(sizeof(node));
        if (buffer == NULL)
        {
            printf("Error allocating memory\n");
            return 1;
        }
        // copy the string in storage into the buffer node
        strcpy(buffer->word, storage);

        // hash the word to obtain hvalue
        unsigned long hvalue = hash(buffer->word);
        
        // assigns buffer's next to point to the current words inside the table
        buffer->next = table[hvalue];
        
        // after making sure that buffer next keeps the address of the other words, you assign the table hvalue to the most recent word
        table[hvalue] = buffer;
        
        // increase word count by one
        word_count++;
        
    }

    fclose(dict);
    return true;
}

// Returns number of words in dictionary if loaded else 0 if not yet loaded
unsigned long size(void)
{
    return word_count;
}

// Unloads dictionary from memory, returning true if successful else false
bool unload(void)
{
    word_count = 0;

    for (int i = 0; i < N; i++)
    {
        node *cursor = table[i];
        while (cursor != NULL)
        {
            node *tmp = cursor;
            cursor = cursor->next;
            free(tmp);
        }
        
        free(cursor);
    }
        
    return true;
}

Solution

  • The "Could not open file" message is probably a typo on the command line (as mentioned in the comments, it doesn't produce a seg fault because it gives the message, then returns false to speller, and speller gives a message and exits).

    Try debug50 with a breakpoint set at node* cursor = table[hvalue]; in check. Use texts/cat.txt as the text file. Once you step into the next line, program will more than likely seg fault.

    The small dictionary (supplied with the distro code) does not populate all members of table. If table[index] is not populated, there is not such thing as table[index]->next or table[index]->word. This while (strcasecmp(small_word, cursor->word) != 0) therefore, is a seg fault waiting to happen.