Search code examples
ctriegetc

reading through a dict file finding "words" and add into trie


For this problem I have to read through a and distinguish what a word is. A word does not need to be meaningful, ie. a word can be asdas,sdgsgd,dog,sweet and etc... To access the I must do it through a mapping file.

File *map, *dictfile, *datafile;
char *dictname, *dataname;
map = fopen(argv[1],"r");
while (fgets(buffer,sizeof(buffer),map) != NULL)
{
dictname = strtok(buffer," ");
dataname = strtok(NULL, " ");
strtok(dictname,"\n");
strtok(dataname,"\n");

that code goes into the mapping file and then distinguishes what the and files names are. from their I open the file

if((datafile = fopen(dictname,"r")) == NULL) //error checking
{
  in here I have to call a readDict(dictfile)
}

My problem is in readDict, I have to go character by character in this dict file to distinguish what is actually a word and what isnt. A word can consist of any alphabetical character. lets say contains: dictionary$@#$LoL!@#FFDAfg(()) the words in this are: dictionary, LoL, FFDAfg. I need to read through these characters and if it is a letter I need to either directly add this into the trie (which I havent figured out how to manage a trie by only adding a character at a time) or I have to keep track of each character and put it into a string and once I reach a non alphabetical character I need to then add that "word" into the trie.

my trie structure is:

struct trieNode
{
bool isWord;
struct trieNode *children[26]; //26 given there are 26 letters in the alphabet
};

I have the method

struct trieNode *createNode()
{
int i;
struct trieNode *tmp = (struct trieNode*)malloc(sizeof(struct trieNode));
for (i = 0; i<26;i++)
tmp -> children[i] = NULL;

tmp -> isWord = false;
return tmp;

my current insert method is:

void insert(char *key)
{
int level = 0;
int index = getIndex(key[level]); //previously defined just gets the index of where the key should go
int len = strlen(key);

if(root == NULL)
root = createNode(); //root is defined under my struct def as: struct trieNode *root = NULL;
struct trieNode *tmp = root;
for (level = 0; level < len; level++)
{
if (tmp -> children [index] == NULL)
tmp ->children[index] = createNode();

tmp = tmp->children[index];
}
}

I believe this method would work if I end up inserting a string into a trie, but my problem is I am unsure of how to get a string from my earlier readDict file. Also I am not sure how to modify this (if possible) to insert a char at a time so I can just read through my char by char and after I check if it is a letter and convert to lowercase add into trie if it is not there.


Solution

  • So one rough way of doing it is something like this. You'll probably need to add a few more conditions to handle some edge-cases.

    void *readDict(char *fileName)
    {
        FILE *file = fopen(fileName, "r");
        char *word = malloc(100);
        int index = 0;
        int c;
        while ((c = fgetc(file)) != EOF)
        {
           char ch = (char)c;
           if (isalpha(ch)) // check if ch is a letter
              word[index++] = ch;
           else
           {
              word[index] = '\0';
              index = 0;
              insert(word);
           }
        }
        fclose(file);
    }