I'm trying to make a Notepad clone, with the added feature of a running word count in the status bar. The word count is inaccurate, counting repeated spaces or carriage returns as new words. Here's what I've tried for the word count feature:
procedure TFormMain.Memo1Change(Sender: TObject);
var
wordSeparatorSet: Set of Char;
count: integer;
i: integer;
s: string;
inWord: Boolean;
begin
wordSeparatorSet := [#13, #32]; // CR, space
count := 0;
s := Memo1.Text;
inWord := False;
for i := 1 to Length(s) do
begin
// if the char is a CR or space, you're at the end of a word; increase the count
if (s[i] in wordSeparatorSet) and (inWord=True) then
begin
Inc(count);
inWord := False;
end
else
// the char is not a delimiter, so you're in a word
begin
inWord := True;
end;
end;
// OK, all done counting. If you're still inside a word, don't forget to count it too
if inWord then
Inc(count);
StatusBar1.Panels[0].Text := 'Words: ' + IntToStr(count);
end;
Of course, I'm open to any alternatives or improvements. I really don't understand why this code increases the word count (count
) with every space and carriage return. I would think after the user hits the space bar (incrementing count
), the variable inWord
should now be False, so if (s[i] in wordSeparatorSet) and (inWord=True)
should resolve to False if the user hits the space bar or Enter key a second time. But that's not what happens.
I really don't understand why this code increases the word count (
count
) with every space and carriage return.
At the first space after a word, you do indeed set inWord
to False
. So, if the next character is also a space, you will (erroneously) run inWord := True
, so if the next (third) character is again a space, you will (erroneously) do Inc(count)
.
You can also notice that the negation of (s[i] in wordSeparatorSet) and (inWord=True)
does NOT imply that "the char is not a delimiter" because of the conjunction with inWord
. The negation of (s[i] in wordSeparatorSet) and (inWord=True)
is, by De Morgan, not (s[i] in wordSeparatorSet) or not (inWord=True)
, which is NOT the same thing as not (s[i] in wordSeparatorSet)
.
A fixed version would look more like
function WordCount(const AText: string): Integer;
var
InWord: Boolean;
i: Integer;
begin
Result := 0;
InWord := False;
for i := 1 to Length(AText) do
if InWord then
begin
if IsWordSep(AText[i]) then
InWord := False;
end
else
begin
if not IsWordSep(AText[i]) then
begin
InWord := True;
Inc(Result);
end;
end;
end;
where IsWordSep(chr)
is defined as something like chr.IsWhitespace
but there are many subtleties, as I discuss at length on my web site.