Search code examples
c++sortinglinked-listword-frequencyword-cloud

Linked List Word Frequency and Sort C++


I am writing a program that reads words from a text file and puts all those words in a linked list. The file has no punctuation, only words. I also want to compare the linked list to a pre-loaded blacklist that is also a linked list.

What I have accomplished is that I can load the link list from file, print the link list, check the size, count the frequency of how many times a word appeared in the file, not print the words that are below a specified frequency, and I have also been able to format all the words to lowercase for better handling.

What I am having a problem with is getting the code right so that it only prints an occurrence of a word once that has multiple frequencies. So if the word "the" shows up 20 times, I don't want it to print "the <1>" then "the <2>" the next time it shows up, clear to "the <20>" I just want it to print one time "the <20>"

I am posting my load file function, print functions and insert word functions, all part of the class wordCloud().

Below is the code:

void wordCloud::insertWord(string aWord){
wordNode *newWord = new wordNode(aWord);

//old code
if (head == NULL)
    head = newWord;
else{
    newWord->next = head;
    head = newWord;
}

//revised code
//newWord->next = head;
//head = newWord;
size++;
}

void wordCloud::insertWordDistinct(string word){
for (wordNode *temp = head; temp != NULL; temp = temp->next){
    if (word == temp->myWord){
        temp->freq_count++;
        //cout << temp->freq_count; //for debugging
    }
}
insertWord(word);
}

void wordCloud::printWordCloud(int freq){
wordNode *temp, *previous;
int listSize = 0;

if (head == NULL)                   //determines if there are any words in the list
    cout << "No Word Cloud" << endl;
else{
    temp = head;

    while (temp->next != NULL){         //prints each word until the list is NULL
        if (temp->freq_count >= freq){
            cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
            temp = temp->next;
            listSize++;
        }
        else{
            previous = temp;
            temp = temp->next;
            previous = NULL;
            free(previous);
        }
    }
}
cout << "\nThere are " << size << " words in the file.\n";      //print file size - for debugging - works
cout << "\nThere are " << listSize << " words in the list\n\n";     //print list size - for debugging - works
system("pause");
}

void wordCloud::printBlacklist(){
wordNode *temp;

if (head == NULL)                   //determines if there is a list
    cout << "No Words in the blacklist" << endl;
else{
    temp = head;

    while (temp != NULL){           //prints each word until the list is NULL
        cout << temp->myWord << endl;
        temp = temp->next;
    }
}
cout << "\nThere are " << size << " words in the file.\n\n";        //print size - for debugging - works
system("pause");
}

void wordCloud::loadWordCloud(string fileName){
ifstream file;                      //variable for fileName
string word;                        //string to hold each word

file.open(fileName);                //open file

if (!file) {                        //error handling
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}

while (!file.eof()){
    file >> word;                   //grab a word from the file one at a time

    insertWordDistinct(changeToLowerCase(word));
    //insertWord(word);             //for debugging
    //cout << word <<'\n';          //print word - for debugging
}

//printWordCloud();                 //print word cloud - for debugging - works
file.close();                       //always make sure to close file after read
}

void wordCloud::loadBlacklist(string fileName){
ifstream file;                      //variable for fileName
string bannedWord;                  //string to hold each word  

file.open(fileName);                //open file

if (!file) {                        //error handling if file does not load
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}   

while (!file.eof()){
    file >> bannedWord;             //grab a word from the file one at a time

    if (bannedWord.empty()){        //error handling if file is empty
        cout << "File is empty!!\n";
        exit(1);
    }
    insertWord(changeToLowerCase(bannedWord));
    //cout << bannedWord << '\n';   //print blacklist words - for debugging
}

//printBlacklist();                 //print blacklist - for debugging - works
file.close();                       //always make sure to close file after read
}

I notice that if I put previous = NULL before free(), that my program does not crash and I don't get any dll memory handling errors. In fact, I can take free() out totally and it seems to work just fine. I just don't know if this is the correct way to do this at all. It seems to me that if I just point a node to NULL< that it won't necessarily delete the data in memory. I just get uneasy not using free() or delete() to terminate the node. Correct me if I am wrong, or please point me in the right directly.

Pretty much, what is wrong with this:

wordNode *previous, *temp = head;

while (temp != NULL){
    if (word == temp->myWord){
        temp->freq_count++;
        previous = temp;
        temp = temp->next;
        delete(previous);
    }
}

I may be going about this wrong, but basically I just need to find the frequency of each word that is inserted into the list, then delete the multiple nodes that contain that word until only the node with the highest frequency count is left to print. I am trying to do this in insertWordDistinct(string word) to accomplish this. Just not sure how to do it.


Solution

  • Your print loop is doing you no favors at all. It should be a simple enumeration filtering on minimum frequency. no deleting, freeing, or otherwise memory management should be happening. Just walk the list:

    void wordCloud::printWordCloud(int freq)
    {
        int listSize = 0;
        int uniqSize = 0;
        for (wordNode *temp = head; temp; temp = temp->next)
        {
            if (temp->freq_count >= freq)
            {
                cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
                listSize += temp->freq_count;
                ++uniqSize;
            }
        }
    
        cout << "\nThere are " << size << " words in the file.\n";
        cout << "\nThere are " << listSize << " words in the filtered list\n\n";
        cout << "\nThere are " << uniqSize << " unique words in the filtered list\n\n";
        system("pause");
    }
    

    This should also get you back to properly managing your lists in the wordCloud::~wordCloud() destructor to once again properly delete nodes. there are plenty of other things I would do different, but its a learning process so I'm not going to spoil your party.


    Update

    Per request from the OP, below is a sample linked list insertion function that insertion sorts as it builds the list. While adapting this he identified significant differences and problems with the original implementation. Hopefully it helps someone else as well.

    void wordCloud::insert(const std::string& aWord, unsigned int freq)
    {
        // manufacture lower-case version of word;
        std::string lcaseWord = make_lower(aWord);
    
        // search for the word by walking a pointer-to-pointer
        //  through the pointers in the linked list.
        wordNode** pp = &head;
        while (*pp && ((*pp)->myWord < lcaseWord)
            pp = &(*pp)->next;
    
        if (*pp && !(lcaseWord < (*pp)->myWord))
        {
            (*pp)->freq_count++;
        }
        else
        {    // insert the node
            wordNode *node = new wordNode(lcaseWord);
            node->freq_count = freq;
            node->next = *pp;
            *pp = node;
            ++size;
        }
    }