Search code examples
c++stringdebuggingsetword-count

Wordcount function having trouble adding words to a set of unique words


I'm writing a wordcount function that should be able to read elements from stdin into a string. Then evaluate the string and return the number of words, number of lines, size of the string, and the number of unique words.

My issue is when it comes to adding words to the unique set. When I write it to add elements to a set, it would count the whitespace as part of the word then push entirely into my set. Example: Input:

this is                                                                                                                                                                                                                                         
        is                                                                                                                                                                                                                                      
a test                                                                                                                                                                                                                                          
test 

Output

a                                                                                                                                                                                                                                               
test                                                                                                                                                                                                                                            
is test this                                                                                                                                                                                                                                    
line is 4                                                                                                                                                                                                                                       
Words = 7                                                                                                                                                                                                                                       
size is 27                                                                                                                                                                                                                                      
Unique is 6 

It counts 7 words in total and 6 unique. I tried debugging it by printing bits of the code as i go so i can keep track of where I went wrong. I can only conclude that the issue lies within my if loops. How can I get past this, I've been stuck for some time now.

Here is my code:

#include<iostream>
#include<string>
#include<set>
using std::string;
using std::set;
using std::cin;
using std::cout;

set<string> UNIQUE;

size_t sfind(const string s) //will take string a count words, add to set
{
    string a;
    int linecount = 0;
    int state = 0;               //0 represents reading whitespace/tab, 1 = reading letter  
    int count = 0;              //word count
    for(size_t i =0; i < s.length(); i++) {
        a+=s[i];                                          //add to new string to add to set
        if(state ==0) {                                  //start at whitespace       
            if(state != ' ' && state != '\t') {         //we didnt read whitespace
                count++;
                state =1;
            }
        }
        else if(s[i]== ' ' || s[i] == '\t' || s[i] == '\n') {
            state = 0;
            UNIQUE.insert(a);                   //add to UNIQUE words
            a.clear();                         // clear and reset the string
        }
        if (s[i] == '\n') {
            linecount++;
        }
    }
    for(set<string>::iterator i = UNIQUE.begin(); i!= UNIQUE.end(); i++) {  
    cout << *i;
        }

    cout << '\n';
    cout << "line is " << linecount << '\n';
    return count;
}

int main()
{
    char c;
    string s; 
    while(fread(&c,1,1,stdin)) {
        s+=c;   //read element add to string
    }

    cout << "Words = " << sfind(s) << '\n';
    cout << "size is " << s.length() << '\n';
    cout << "Unique is "<< UNIQUE.size() << '\n';  
    return 0;
}

Also I will be using

fread(&c,1,1,stdin)

because i will be using it later on with a larger wordcount function.


Solution

  • Rather than writing code trying to parse the string on spaces, use std::istringstream to do the parsing.

    Here is an example:

    #include <string>
    #include <iostream>
    #include <sstream>
    #include <set>
    
    int main()
    {
        std::set<std::string> stringSet;
        std::string line;
        while (std::getline(std::cin, line))
        {
            std::istringstream oneline(line);
            std::string word;
            while (oneline >> word)
            {
               std::cout << word << "\n";
               stringSet.insert(word);
            }
        }
    
        std::cout << "\n\nThere are " << stringSet.size() << " unique words";
    }
    

    Live Example