Search code examples
cdata-structurestriecs50

Trie Implementation in C: Segmentation Fault


I'm currently doing CS50 from Harvard and the goal is it to load a dictionary into any data structure in the fastest way possible. For this problem set I'm using a Trie.

The logic behind my code is as follows:

  1. Read one character at a time.
  2. Check in the childnode of the trie if the character already exists, if it equals NULL, we allocate some space to it.
  3. The curser is set to the childnode we just allocated space to.
  4. If we reach the end of a word ("\n") we set the boolean value to true and completely reset the curser to its initial value (which we have stored previously in the curser->root).

I have tried several implementations, some of them had a few logical errors which I was not happy with and some gave me segmentation faults when I had a big dictionary.

Below is the code of my latest implementation, basically what happens is that it's fine with loading the first word into the trie structure, but it fails at the second. The problem then lays into me setting the new node value to the childenode (to which we allocated some free space). The logic behind this is it to obviously connect the tree and move on to the next node. This is the code which I think is wrong:

curser = curser->children[tolower(ch) - 'a'];

But the thing is, it worked in some of my other implementations, only with this one it stopped working all of a sudden and gave me a segmentation fault after the first word. As I said, I'm a beginner in coding so please enlighten me and criticize my implementation! Thanks a bunch.

#include <stdbool.h>
#include <stdio.h>
#include "dictionary.h"
#include <ctype.h>
#include <stdlib.h>

typedef struct node
{
    bool end;
    struct node* children[27];
    struct node* root;
    struct node* next;
} node;

//global variable used to check the number of words in the trie
int totalwords = 0;

//root node
node* curser;

int ch;

int main(void)
{
    FILE* dict = fopen("text.txt", "r");
    if (dict == NULL)
    {
        printf("Could not open dictionary\n");
        return 1;
    }

    curser = (struct node*) malloc(sizeof(node));
    curser->root = curser;

    for (ch = fgetc(dict); ch != EOF; ch = fgetc(dict))
    {
        if (ch == '\0')
        {
            curser->end = true;
            curser = curser->root;
            totalwords++;
            printf("%i\n", totalwords);
        }

        else
        {
            if (isalpha(ch))
            {
                if (curser->children[tolower(ch) - 'a'] == NULL)
                {
                    curser->children[tolower(ch) - 'a'] = (struct node*)malloc(sizeof(node));
                }
                curser = curser->children[tolower(ch) - 'a'];
            }

            else if (ch == '\'')
            {
                if (curser->children[26] == NULL)
                {
                    curser->children[26] = (struct node*)malloc(sizeof(node));
                }
                curser = curser->children[26];
            }
        }
    }

    fclose(dict);
    return false;
}

Edit:

Another question I have is why my in my current code it is not able to detect the Null Terminator \0 but it can detect a new line \n? I need to be able to detect the null terminator in order to get the correct amount of words. Any suggestion as to what is wrong?


Solution

  • After curser->root=curser; You should do the following:

    curser->end=false;
    curser->next=NULL;
    for(i=0;i<27;i++)
        curser->children[i]=NULL;
    

    When you initialize memory for curser it is not guaranteed that it's members will be automatically allocated to NULL and false.

    Do this everywhere for a node you are allocating memory dynamically.

    You also need to set child->root=curser->root for every children you are allocating memory dynamically