Search code examples
c++loopscountifstream

Counting occurrences of a word


I'm trying to find the occurrences of a given word from an input file, and I'm able to correctly count the occurrences of a letter/character, but when I attempt to find a word the program just returns the count as 0. What am I doing wrong?

ifstream input("input.txt");
input.open("input.txt");
string video = "video", ands = "and";
string str1((istreambuf_iterator<char>(input)),
    istreambuf_iterator<char>());
int videocount = 0, sentcount = 0, wordcount = 0, wordcountand = 0, wordcountand2 = 0;
for (int i = 0; i < str1.length(); i++)
{
    if (str1 == video) {
        ++videocount;
    }

    if (str1[i] == '.') {
        sentcount++;
    }
    if (str1[i] == ' ') {
        wordcount++;
    }
    if (str1 == ands) {
        wordcountand++;
    }
}

Edit : I just changed the way the file was read and everything worked again.

while (input >> filewords) {
      {wordcount++; }
      if (filewords == word1) {
          ++videocount;
      }
      if (filewords == word2) {
          wordcountand++;
      }
        for (int i = 0; i < filewords.length(); i++) {
            if (filewords[i] == '.') {
                sentcount++;
            }   
        }
    }

Solution

  • Basically, the question has already been answered in the comment. You cannot compare a search string to the complete text file that is stored in your variable "str1". The result will of course always be false.

    The equal operator == does not look for sub-strings. And this brings us already to the answer, the algorithm that we want to use. We will use std::string.substr. Please see here for a description of the function. The function parameters are:

    • start position
    • length of sub-string

    So, we need to find the start-position of a word and the end-position of a word. With that, we can count the length of a word which is "end-position" - "start-position".

    But how to identify a word? A word usually consists of alpha-numerical characters. And if we iterate through the complete text, and we compare the previous checked character, with the current evaluated character, we can state the following:

    • If the previous charachter was NOT alpha-numeric and the current is, then we found the begin of a word. We will then remember the index, the start position of that word.
    • If the previous charachter was alpha-numeric and the current ist not, then we found the end of a word. We can then start the comparisons, because we know start end end positions.

    And then, something like word = str1.substr(startPosition, endPosition-startPosition); would give us a single word. This we can compare with our search words, like for example:

    if (word == video) ++videocount;
    

    But we can go further. With a very simple standard method, we can store and count all words. For that we can use a std::map or a std::unordered_map. We use the std::maps index operator. Please see here. And especially read the sentence:

    Returns a reference to the value that is mapped to a key equivalent to key, performing an insertion if such key does not already exist.

    So, it will either create a new entry, or, find an existing entry. In any case, a reference (to the either already existing or the newly created entry) will be returned. And that will be incremented. This can then end up in something like:

    wordCounter[text.substr(startIndexOfWord, index - startIndexOfWord)]++
    

    So, here, we first build a sub-string using the already described algorithm. This sub-string is then either found or added to the std::map. In any case, a reference will be returned, which we will increment.

    At the end, we will simply output all words and counters.

    In the following proposal I am using C++17 and the features of C++17 like the if-statment with initializer or structured bindings. So you need to enable C++17 for your compiler.

    Please see:

    #include <iostream>
    #include <fstream>
    #include <string>
    #include <iterator>
    #include <cctype>
    #include <vector>
    #include <map>
    #include <iomanip>
    
    int main() {
    
        // Open the input file and check, if that works
        if (std::ifstream ifs("input.txt"); ifs) {
    
            // Read the complete text file into a string variable
            std::string text(std::istreambuf_iterator<char>(ifs), {});
    
            // Define the counters
            size_t sentenceCounter{};
            std::map<std::string, size_t> wordCounter{};
            size_t overallWordCounter{};
    
            // And temporary storage of characters from the complete text
            char currentCharacter{};    char lastCharacter{};
    
            // Here we stort the index of a word start
            size_t startIndexOfWord{};
    
            // Iterate over all characters from the source file
            for (size_t index{}; index < text.length(); ++index) {
    
                // Read the current character
                const char currentCharacter = text[index];
    
                // Each dot will be counted as an indicator for a sentence
                if ('.' == currentCharacter) ++sentenceCounter;
    
                // Now check, if we have found the start of a word. The we will just store the index
                if (std::isalnum(currentCharacter) and not std::isalnum(lastCharacter))
                    startIndexOfWord = index;
    
                // Now, check, if we found the end of a word. Add to map and increment counter
                if (std::isalnum(lastCharacter) and not std::isalnum(currentCharacter)) 
                    wordCounter[text.substr(startIndexOfWord, index - startIndexOfWord)]++;
    
                // The next lastCharacter is the currentCharacter of now
                lastCharacter = currentCharacter;
            }
    
            // Go through the complete map
            for (const auto& [word, count] : wordCounter) {
                // SHow words and counters
                std::cout << std::left << "Word: " << std::setw(30) << word << " Count: " << count << "\n";
                // Calculate overall sum of words
                overallWordCounter += count;
            }
            // Show final result
            std::cout << "\nWords overall: \t" << overallWordCounter << "\nSentences: \t" << sentenceCounter << '\n';
        }
        else {
            std::cerr << "\n***Error: Could not open input file.\n";
        }
        return 0;
    }
    
    

    Of course there are many many other possible solutions, especially with std::regex.

    If you have any questions, I am happy to answer