Search code examples
cdictionarydynamic-arraystriecalloc

Segmentation fault loading dictionary. Is it being caused by calloc, sys/stat.h or something else?


This function is supposed to load a dictionary into a trie. I wanted to find out how large the dictionary file was so I could calloc all the memory at once. The reason for this being so that all the memory would be located close together and could therefor take advantage of hardware that helps speed up the search. Too do this I found recommendations for 2 methods. One of them is the use of sys/stat.h that you will see in my code.

When I run this code I receive a "segmentation fault" which I know means I am trying to access memory I don't have permission for. Through the use of GDB I have found that the segmentation fault occurs on line 116( a.k.a: the line that reads "else if (cur->children[key] == NULL)") I have found that the value in key at that time is 12. At first I thought that the problem was my use of calloc or sys/stat.h since these are the 2 things that I know the least about that I made use of. However the more I research them the less likely this seems. If it is not one of these then I don't even have a clue where to look anymore.

Bellow is only the code I believe to be relevant:

#include <stdlib.h>
#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <sys/stat.h>

#include "dictionary.h"

typedef struct node
{
    bool end[26]; 

    struct node* children[26]; 
} node;

node* start;

int key;
int last;
int dic_count;

bool load(const char* dictionary)
{
    struct stat s;
    stat(dictionary, &s);
    int size = s.st_size;

    dic_count = 0;

    int z = 1;

    FILE* dic = fopen(dictionary, "r");
    if (dic == NULL)
    {
        return false;
    }

    start = calloc(size, sizeof(node));

    if (start == NULL)
    {
        return false;
    }

    int l = 0;
    int d;

    node* cur = &start[0];

    while (0 != (d = fgetc(dic)))
    {
        int d = fgetc(dic);

        if (l > 0)
        {
            last = key;
        }

        l = 1;

        key = d - 'a';

        if (d == '\n')
        {
            cur->end[last] = true;
            cur = &start[0];
            dic_count++;
        }
        else if (cur->children[key] == NULL)
        {
            node* new = &start[z];

            cur->children[key] = new;

            z++;

            if (cur->children[key] == NULL)
            {
                return false;
            }

            cur = cur->children[key];
        }
        else
        {
            cur = cur->children[key];
        }
    }
    return true;
}

Any help is greatly appreciated.


Solution

  • Are you sure your file contains a binary 0? If you trying to read till the end of file, test fgetc result against EOF, not 0. Otherwise your loop never terminates.

    Besides that, you only processing every second character.

    Expanding as requested:

    From man fgetc:

    fgetc(), getc() and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error

    You are probably confusing it with fgets return value.

    while ((ch = fgetc(fp)) != EOF)
    

    is safe and sound. Again, the source of possible confusion is unsoundness of

    while (!feof(fp))
    

    Now, regarding unprocessed characters: you wrote

        while (0 != (d = fgetc(dic)))
        {
            int d = fgetc(dic);
    

    The code reads a character in while expression, compares it to 0, and reads a (next) character. A first character is lost.