Search code examples
ctrie

c- sysmalloc assertion problems


I am hoping that someone can help me understand where I have gone wrong here. I am implementing a program to check for spelling correctness. In the process I use a trie data structure to load into memory a dictionary text file to check words against.

Overall it seems to operate as expected but I get a lot of problems when loading in the longest possible word, namely pneumonoultramicroscopicsilicovolcanoconiosis. I do not understand why but first let me present some code -

/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char *dictionary)
{
    FILE *dict = fopen(dictionary, "r");
    if (dict == NULL)
    {
        fprintf(stderr, "Could not open %s dictionary file.\n", dictionary);
        return false;
    }

    // Initialise the root t_node
    root = (t_node *) malloc(sizeof(t_node));
    if (root == NULL)
    {
        fprintf(stderr, "Could not allocate memory to trie structure.\n");
        return false;
    }

    // Set all current values in root to NULL and is_word to false
    for (int i = 0; i < ALPHA_SIZE; i++)
    {
        root->branch[i] = NULL;
    }
    root->is_word = false;


    while (1)
    {
        // Create char aray to hold words from .txt dictionary file once read 
        char *word = (char *) malloc((LENGTH + 1) * sizeof(char));

        if (fscanf(dict, "%s", word) == EOF)
        {
            free(word);
            break;
        }

        t_node *cursor = root;
        int len = strlen(word) + 1;
        for (int i = 0; i < len; i++)
        {
            if (word[i] == '\0')
            {
                cursor->is_word = true;
                cursor = root;
                word_count++;
            }

            else
            {
                int index = (word[i] == '\'') ? ALPHA_SIZE - 1 : tolower(word[i]) - 'a';
                if (cursor->branch[index] == NULL)
                {
                    cursor->branch[index] = (t_node *) malloc(sizeof(t_node));
                    for (int j = 0; j < ALPHA_SIZE; j++)
                    {
                        cursor->branch[index]->branch[i] = NULL;
                    }
                    cursor->branch[index]->is_word = false;
                }
            cursor = cursor->branch[index];
            }
        }

        free(word);
    }

    fclose(dict);
    return true;
}

This is my entire function to load in a dictionary into memory. For reference I defined the trie structure and created root prior to this function. LENGTH is defined as 45 to account for the longest possible word. And ALPHA_SIZE is 27 to include lower case letters plus apostrophes.

As I said already with all other shorter words this function works well. But with the longest word the function works through about half of the word, getting up to index 29 of the word variable before suffering a sysmalloc assertion issue where it then aborts.

I've tried to find what is happening here but the most I can see is that it suffers the fault at -

cursor->branch[index] = (t_node *) malloc(sizeof(t_node));

once it gets to the 29th index of word, but no other indexes prior. And all other posts I can find relate to functions giving this error that do not work at all rather than most of the time with an exception.

Can anyone see what I cannot and what may be the error I made in this code? I'd appreciate any help and thank you all for your time taken to consider my problem.

* UPDATE *

First of all I want to thank everyone for all of their help. I was so pleasantly surprised to see how many people responded to my issue and how quickly they did! I cannot express my gratitude to all of you for your help. Especially Basile Starynkevitch who gave me a great amount of information and offered a lot of help.

I am extremely embarrassed to say that I have found my issue and it's something that I should have caught a LONG time before turning to SO. So I must apologise for using up everyone's time on something so silly. My problem lied here -

else
{
    int index = (word[i] == '\'') ? ALPHA_SIZE - 1 : tolower(word[i]) - 'a';
    if (cursor->branch[index] == NULL)
    {
        cursor->branch[index] = (t_node *) malloc(sizeof(t_node));
        for (int j = 0; j < ALPHA_SIZE; j++)
        {
            cursor->branch[index]->branch[j] = NULL; // <<< PROBLEM WAS HERE
        }
        cursor->branch[index]->is_word = false;
    }
    cursor = cursor->branch[index];
}

In my code originally I had 'cursor->branch[index]->branch[i] = NULL' where I was iterating through 'int j' in that loop, not i ....

Sooooo once again thank you all for your help! I am sorry for my poorly formatted question and I will do better to abide by the SO guidelines in the future.


Solution

  • Your

      char *word = (char *) malloc((LENGTH + 1) * sizeof(char));
    

    is not followed by a test on failure of malloc; you need to add:

      if (!word) { perror("malloc word"); exit(EXIT_FAILURE); }
    

    before

              if (fscanf(dict, "%s", word) == EOF)
    

    since using fscanf with %s on a NULL pointer is wrong (undefined behavior, probably).

    BTW, recent versions of fscanf (or with dynamic memory TR) accepts the %ms specifier to allocate a string when reading it. On those systems you could:

     char*word = NULL;
     if (fscanf(dict, "%ms", &word) == EOF))
       break;
    

    and some systems have getline, see this.

    At last, compile with all warnings and debug info (gcc -Wall -Wextra -g with GCC), improve your code to get no warnings, and use the debugger gdb and valgrind.

    BTW pneumonoultramicroscopicsilicovolcanoconiosis has 45 letters. You need one additional byte for the terminating NUL (otherwise you have a buffer overflow). So your LENGTH should be at least 46 (and I recommend choosing something slightly bigger, perhaps 64; in fact I recommend using systematically C dynamic memory allocation and avoiding hard-coding such limits and code in a more robust style, following the GNU coding standards).