The below is a code to find number of words in a given sentence using recursion.
It seems that the two given "cout" statements result in different outputs and i cannot seem to understand the reason why?
#include <iostream>
using std::cin;
using std::cout;
int count_end(char *a, int start, int ctr) {
if (a[start] == '.') {
cout << ctr << "\n";
return ctr;
}
else if (a[start] == ' ' && a[start - 1] != ' ') {
ctr++;
}
start = count_end(a, start + 1, ctr);
return ctr;
}
int main() {
char sentence[1000];
cin.getline(sentence, 1000);
int x;
x = count_end(sentence, 1, 1);
cout << "is count" << x << "\n";
return 0;
}
I tried writing cout statements in later part of codes to judge the control flow. It seems the control flow is fine. I know of different ways to implement the same thing which give the correct output but I wonder why this particular method doesn't work.
The comment by @Christian Stieber identifies the immediate problem:
When you do the recursive call to
count_end
, you wanted to assign the return value toctr
.
//start = count_end(a, start + 1, ctr); // Wrong!
ctr = count_end(a, start + 1, ctr); // Right.
With this fix, the two cout
expressions display the same value. For the trial run below, I entered a famous quotation by Mahatma Gandhi.
An ounce of patience is worth more than a tonne of preaching.
12
is count12
You can clean up the output by by removing cout
from function count_end
, and rewording the message.
// main.cpp
#include <iostream>
using std::cin;
using std::cout;
int count_end(char* a, int start, int ctr) {
if (a[start] == '.') {
//cout << ctr << "\n"; // Delete this.
return ctr;
}
else if (a[start] == ' ' && a[start - 1] != ' ') {
ctr++;
}
//start = count_end(a, start + 1, ctr); // Wrong!
ctr = count_end(a, start + 1, ctr); // Right.
return ctr;
}
int main() {
char sentence[1000];
cout
<< "WORD COUNTER\n\n"
<< "Enter some text: ";
cin.getline(sentence, 1000);
cout << '\n';
int x;
x = count_end(sentence, 1, 1);
cout << "Word count: " << x << "\n\n";
return 0;
}
// end file: main.cpp
Output:
WORD COUNTER
Enter some text: An ounce of patience is worth more than a tonne of preaching.
Word count: 12
The algorithm used by this program requires that a sentence end with a period. In case none is found, it will overrun the string, and possibly, the buffer, in a fruitless effort to find one.
WORD COUNTER
Enter some text: An ounce of patience is worth more than a tonne of preaching
Word count: 14
Other times, it will misread an ellipsis (...
), and declare the end of a sentence, when, in fact, it is still in the middle.
WORD COUNTER
Enter some text: It will misread an ellipsis (`...`), and declare the end of a sentence.
Word count: 6
A period alone, without any other characters, led to this:
WORD COUNTER
Enter some text: .
Word count: 4
The limiting case of the I-forgot-the-period comes when you forget the whole sentence! For this run, I just pressed Enter, without entering anything.
WORD COUNTER
Enter some text:
Word count: 3
A one-word entry, followed by a space, led to this output:
WORD COUNTER
Enter some text: one
Word count: 5
And a simple space alone gave me this:
WORD COUNTER
Enter some text:
Word count: 3
It is obvious, by now, that recursion should stop when the null character ('\0'
) is found.
// Recursion stops at the end of the string.
if (a[start] == '\0') {
return ctr;
}
This fixes many of the issues above, but won't solve the problem of the empty string or trailing spaces.
The end of a word occurs when a non-whitespace character is followed by either whitespace, or the null character, which marks the end of the C-string. The program in the OP, fails to check the latter case.
// ...
else if (a[start] != ' ' // non-whitespace
&& (a[start + 1] == '\0' || a[start + 1] == ' ')) { // followed by null or whitespace
ctr++;
}
Because this test looks forward in the string, rather than backward, as does the test in the OP, recursion can start properly in function main
, with both position and word count set to zero. That will prevent a couple of off-by-one errors from occurring.
It is okay to look forward, because the test to detect the end of a word does not occur until after it has been determined that a[start]
is not the terminating null character. Examining the character at a[start + 1]
will not overrun the string.
The following two helper functions are used to extend this test to any whitespace character.
bool is_space(unsigned char const c) {
// Note: Per CppReference, `std::isspace` should be called with
// an unsigned argument.
// https://en.cppreference.com/w/cpp/string/byte/isspace
return std::isspace(c);
}
bool is_null_or_space(unsigned char const c) {
return !c || std::isspace(c);
}
This leads to:
// Count word endings.
// The end of a word occurs when a non-whitespace character
// is followed by either whitespace, or the null character,
// which marks the end of the C-string.
if (!is_space(a[start]) && is_null_or_space(a[start + 1])) {
++ctr;
}
This can be accomplished by changing one line in function main
:
// from function `main`
x = count_end(sentence, 0, 0);
I prefer, however, to change the name of function count_end
to the more expressive count_word_endings
. At the same time, I am going to provide a couple of default arguments to get the recursion started:
int count_word_endings(
char* a, // null-terminated C-string
int start = 0, // start search at `a[start]`
int ctr = 0) // word counter
{
// ...
}
The following program fixes the issues described above.
Recursion starts with the call to count_word_endings(text)
in function main
.
// main.cpp
#include <cctype>
#include <iostream>
bool is_space(unsigned char const c) {
// Note: Per CppReference, `std::isspace` should be called with
// an unsigned argument.
// https://en.cppreference.com/w/cpp/string/byte/isspace
return std::isspace(c);
}
bool is_null_or_space(unsigned char const c) {
return !c || std::isspace(c);
}
int count_word_endings(
char* a, // null-terminated C-string
int start = 0, // start search at `a[start]`
int ctr = 0) // word counter
{
// Recursion stops at the end of the string.
if (a[start] == '\0') {
return ctr;
}
// Count word endings.
// The end of a word occurs when a non-whitespace character
// is followed by either whitespace, or the null character,
// which marks the end of the C-string.
if (!is_space(a[start]) && is_null_or_space(a[start + 1])) {
++ctr;
}
// Continue recursion, starting at the next character.
return count_word_endings(a, start + 1, ctr);
}
int main() {
enum : std::size_t { BUFFER_SIZE = 1000 };
char text[BUFFER_SIZE];
std::cout
<< "WORD COUNTER\n\n"
<< "Enter some text: ";
std::cin.getline(text, BUFFER_SIZE);
std::cout << "\nWord count: " << count_word_endings(text) << "\n\n";
return 0;
}
// end file: main.cpp