I'm trying to create a program that reads a dictionary and then stores the words into the hash table, then read another file checks every word of that file if it is in the hash table if it is not then it will be outputted as a misspelled word. I'm first trying to check if I can load the dictionary file into my hash table and then output the words in the hash table yet my code seems to crash whenever I try to run it. The hash function I use was taken from the Internet. I'm also still very new with data structures, and having a hard time understanding.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node
{
char* word;
struct node *next;
}
node;
node *table[10];
// hash function
unsigned int hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
return hash % 10;
}
int main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++)
{
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL)
{
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL)
{
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata))
{
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL)
{
printf("here\n");
return 1;
}
// make the new node the new head of the list
strcpy(newNode->word, words);
newNode->next = table[index];
table[index] = newNode;
// free memory
free(newNode);
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++)
{
node *tmp = table[i];
while (tmp->next != NULL)
{
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++)
{
if (table[i] != NULL)
{
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
At least three bugs that independently caused a segfault:
First, newNode->word
is used unitialized, so it points to random memory, so the strcpy
would segfault. Better to use strdup
Also, after you put newNode
in the table, you do free(newNode)
making what it points to invalid. This causes the second loop to segfault
Third, in the second loop, if table[i]
is null, the while (tmp->next != NULL)
will segfault
I've annotated and corrected your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node {
char *word;
struct node *next;
} node;
node *table[10];
// hash function
unsigned int
hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
// NOTE: not a bug but probably better
#if 0
return hash % 10;
#else
return hash % N;
#endif
}
int
main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++) {
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL) {
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL) {
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata)) {
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL) {
printf("here\n");
return 1;
}
// make the new node the new head of the list
// NOTE/BUG: word is never set to anything valid -- possible segfault here
#if 0
strcpy(newNode->word, words);
#else
newNode->word = strdup(words);
#endif
newNode->next = table[index];
table[index] = newNode;
// free memory
// NOTE/BUG: this will cause the _next_ loop to segfault -- don't deallocate
// the node you just added to the table
#if 0
free(newNode);
#endif
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++) {
node *tmp = table[i];
// NOTE/BUG: this test fails if the tmp is originally NULL (i.e. no entries
// in the given hash index)
#if 0
while (tmp->next != NULL) {
#else
while (tmp != NULL) {
#endif
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++) {
if (table[i] != NULL) {
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
UPDATE:
I made a linked list program before that stores an integer in the list,
int number; struct node *next;
and I usednewNode->number = 5;
and it worked, why is it in this case it doesn't?? Is it because I am working with strings here??
The difference is that word
is a pointer. It must be assigned a value before it can be used. strcpy
does not assign a value to word
. It tries to use the contents of word
as the destination address of the copy.
But, the other two bugs happen regardless of word
being a char *
vs number
being int
.
If you had defined word
not as a pointer, but as a fixed array [not as good in this usage], the strcpy
would have worked. That is, instead of char *word;
, if you had done (e.g.) char word[5];
But, what you did is better [with the strdup
change] unless you can guarantee that the length of word
can hold the input. strdup
will guarantee that.
But, notice that I [deliberately] made word
have only five chars to illustrate the problem. It means that the word to be added can only be 4 characters in length [we need one extra byte for the nul terminator character]. You'd need to use strncpy
instead of strcpy
but strncpy
has issues [it does not guarantee to add the nul char at the end if the source length is too large].
Conincidentally, there is another question today that has an answer that may help shed some more light on the differences of your word
struct member: Difference between memory allocations of struct member (pointer vs. array) in C