Search code examples
carraysstringpointerstokenize

Array of tokenized string literals in C


I'm writing a C program to tokenize an input text file and track the frequency of word length, alongside tracking and storing the corresponding words themselves. I have the word count working fine, but can't get my word_tracker array to store the strings correctly:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#define MAX_LENGTH 34
#define MAX_WORDS 750

int main(int argc, char *argv[]){ 

    FILE *fp; //input file
    const char *cur; //stores current word as string literal
    char words[MAX_LENGTH*MAX_WORDS]; //stores all words from text file
    char file_name[100]; //stores file name
    int word_count[MAX_LENGTH] = {0}; //array to store frequency of words based on length
    const char *word_tracker[MAX_LENGTH][MAX_WORDS]; //array to store string literals of each word, indexed by char count and 
    int char_count; //current word's char count

    printf("Enter a file name: ");
    scanf("%s", file_name);
    fp = fopen(file_name, "r"); 

    if((fp==NULL)){
        printf("Failure: missing or unopenable file");
        return -1; 
    }else{
        while(fgets(words, sizeof(words), fp)){
            cur= strtok(words, " -.,\b\t\n"); //first word of line
            char_count = strlen(cur);
            word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
            word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

            /*test printing*/
            printf("%d:", char_count-1); 
            printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); 

            while(cur){
                    cur = strtok(NULL, " -.,\b\t\n"); //next word
                    if(cur){
                        char_count = strlen(cur);
                        word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
                        word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

                        /*test printing*/
                        printf("%d:", char_count-1); //test print
                        printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); //test print

                    }
                }
            }
        }
//Testing word_tracker: (This doesn't work)
    printf("\n\n%s \n", word_tracker[0][0]);
    printf("\n%s \n", word_tracker[1][0]);
    printf("%s \n", word_tracker[2][0]);
    printf("%s \n", word_tracker[3][0]);
    printf("%s \n", word_tracker[4][0]);
    printf("%s \n", word_tracker[5][0]);

    return 0;
}

The "interior" tests (within the tokenizing loop) work well, the correct string and length are printed. However, the print tests at the end of main print seemingly random strings, relative to what the input text file says they should input. I have three theories on what I am doing wrong:

1) My indexing is wrong

2) My understand of how to populate and use char* arrays is incorrect

3) My tokenizing loop is incorrect (does cur not equal "the isolated string"?)

I've noticed that the tests at the end of main display variants of whatever is written on the final line of the input file, so I think that my tokenizing loop is likely wrong. Any guidance is greatly appreciated, thank you!


Solution

  • Your result array currently is const char *word_tracker[MAX_LENGTH][MAX_WORDS], which is a 2D-array of pointers. You could either (a) use a 1D-array of pointers and allocate memory then for each word found, or (b) use a 2D-array of characters and strcpy each word at the proper position.

    So (a) would look like...

    const char *word_tracker[MAX_WORDS];
    ...
    word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS] = strdup(cur);
    

    And (b) would look like

    char word_tracker[MAX_WORDS][MAX_LENGTH];
    ...
    strncpy(word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS], cur, MAX_LENGTH);
    word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS][MAX_LENGTH-1] = '\0'
    

    Note that in (b), MAX_LENGTH indicates the maximum length of a string (i.e. a single word) and is therefore the second index. strncpy makes sure not to exceed the size reserved for a word.