Search code examples
cstructtoken

Storing tokens in struct


Sorry for the duplicate question, but I'm very new to C programming and can't wrap my head around how to implement previous answers on the same toping into my own code.

I am to read in text from either a file on disk or stdin, sort the words and then present the user with a list of word occurrence (the most occurring word at the top and then in falling order).

I'm currently stuck with storing my tokenised word it a suitable way to later be able to count and sort them. I've decided to go with a struct.

I've written a test-file where I use fgets from stdin to feed it with data.

This is the code:

 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>

int main(int argc, char const *argv[])
{
    struct words
    {
        char word[500];
        unsigned int count;
    };

    int size = 500;
    char *buffer;
    char token;
    struct words w;

    #ifdef DEBUG
    printf("--!DEBUG INFO!-- \n Right before the 4-loop now\n--!DEBUG INFO!--\n");
    #endif
    for (int i = 0; i < 10; ++i)
    {
        printf("Please enter word\n");
        fgets(buffer, size, stdin);
        #ifdef DEBUG
        printf("--!DEBUG INFO!-- \n %c\n--!DEBUG INFO!--\n", buffer);
        #endif
        token = strtok(buffer[i], "\n");
        strcpy(w.word[i], token);
        #ifdef DEBUG
        printf("--!DEBUG INFO!-- \n %c\n--!DEBUG INFO!--\n", w.word[i]);
        #endif
    }

    for (int i = 0; i < 10; ++i)
    {
        printf("%c\n", w.word[i]);
    }
    return 0;
}

When compiling I get a whole bunch of warning messages, most of them stating something similar to this:

incompatible pointer to integer conversion assigning to 'char' from
      'char *'; dereference with * [-Wint-conversion]
                token = strtok(buffer[i], "\n");

The program does however compile, and run until I give it data and hit enter. After that, it crashes with a Segmentation fault: 11 message

./tok_struct 
--!DEBUG INFO!-- 
 Right before the 4-loop now
--!DEBUG INFO!--
Please enter word
Test 
Segmentation fault: 11

I'm very grateful for any help can get!


Solution

  • for one thing buffer needs to have a size allocated to it, it seems it is just an uninitialized pointer in your code.

    once you do fgets(buffer,...) you enter undefined behavior territory if the buffer does not point to a place where the input can be stored.

    so first declare buffer as an array

    char buffer[512]; // or whatever size you deem is appropriate
    

    then read the line into the buffer (instead of a for loop use while, you could check line length and quit loop if user didnt enter anything)

    while (fgets(buffer,sizeof(line),stdin) != NULL)
    {
      char* token = strtok(buffer, "\n"); 
      if (token != NULL)
      {
       // in order to get a pointer to the rest of the words you 
       // need to call strtok multiple times and with another 
       // separator since one can assume that there is space between
       // the words e.g.  char* token = strtok(buffer, " \n"); 
       // and to process all words in the line:
       // for (char* token = strtok(buffer, " \n";
       //       token != NULL; 
       //       token = strtok(NULL, " \n"))
       // {
       //  .. here you store your tokens
       // }
      }
    }
    

    to store the tokens you cannot have the struct as you have it the char word[500] is just a character array so indexing in that array and having that as a target for your strcpy makes no sense.

    instead you need to have an array of structs.

    struct words w[200]; // or how many words you are expected to handle
    

    now for each word you find you need to look through the array if it already exists, if yes increment counter else copy in word and set counter to 1. You should initialize the array to make sure it is set to 0. Keep track of how many words you have in your array e.g. wordsFound

    int wordsFound = 0;
    for (char* token = strtok(buffer, " \n"; token != NULL; token = strtok(NULL, " \n"))
    {
      ...
    }
    

    A final note: strtok modifies the argument that is passed to it, so you cannot store the pointer that is returned. Either you need to copy it as above or you need to allocate space and then copy to it.

    Normally one would not have an array of words but instead for instance a linked list of words which grows whenever a new word is found, of course this example can be expanded on to have better lookup etc but I guess that is not your goal for now.