Does Delphi (10.4) have a string-tokenizer that extracts string-token-objects from a string in a similar way as below?
MyPhrase := 'I have a simple word and a complex Word: A lot of WORDS.';
MyTokens := MyTokenize(MyPhrase, 'word');
for i := 0 to MyTokens.Count - 1 do
Memo1.Lines.Add(IntToStr(MyTokens[i].Pos) + ': ' + MyTokens[i].String);
Gives this result in Memo1:
16: word
35: Word
50: WORD
Searching for "tokenize string" in the Delphi documentation did not get any useful results for this purpose.
Of course, writing such a function is trivial, but I wonder if there already is a procedure for this in the existing huge Delphi code treasure.
EDIT: I am experimenting with a wordlist that should have the required features:
program MyTokenize;
{$APPTYPE CONSOLE}
{$R *.res}
uses
CodeSiteLogging,
System.RegularExpressions,
System.Types,
System.Classes,
System.StrUtils,
System.SysUtils;
type
PWordRec = ^TWordRec;
TWordRec = record
WordStr: string;
WordPos: Integer;
end;
TWordList = class(TList)
private
function Get(Index: Integer): PWordRec;
public
destructor Destroy; override;
function Add(Value: PWordRec): Integer;
property Items[Index: Integer]: PWordRec read Get; default;
end;
function TWordList.Add(Value: PWordRec): Integer;
begin
Result := inherited Add(Value);
end;
destructor TWordList.Destroy;
var
i: Integer;
begin
for i := 0 to Count - 1 do
FreeMem(Items[i]);
inherited;
end;
function TWordList.Get(Index: Integer): PWordRec;
begin
Result := PWordRec(inherited Get(Index));
end;
var
WordList: TWordList;
WordRec: PWordRec;
i: Integer;
begin
try
//MyPhrase := 'A crossword contains words but not WORD';
WordList := TWordList.Create;
try
// AV only at the THIRD loop!!!
for i := 0 to 2 do
begin
GetMem(WordRec, SizeOf(TWordRec));
WordRec.WordPos := i;
WordRec.WordStr := IntToStr(i);
WordList.Add(WordRec);
end;
for i := 0 to WordList.Count - 1 do
Writeln('WordPos: ', WordList[i].WordPos, ' WordStr: ', WordList[i].WordStr);
WriteLn(' Press Enter to free the list');
ReadLn;
finally
WordList.Free;
end;
except
on E: Exception do
begin
Writeln(E.ClassName, ': ', E.Message);
ReadLn;
end;
end;
end.
Unfortunately, it has a strange bug: It gets an AV exactly at the THIRD for loop!
EDIT2: It seems that the AV happens only when the project's Build Configuration is set to Debug
. When the project's Build Configuration is set to Release
then there is no AV. Has this to do with the MemoryManager?
On request, here is how I would do this myself:
First, I want to create a function that performs this operation, so it can be reused every time we need to do this.
I could have this function return or populate a TList<TWordRec>
, but then it would be tiresome to work with it, because the user of the function would then need to add try..finally
blocks every time the function is used. Instead, I let it return a TArray<TWordRec>
. By definition, this is simply array of TWordRec
, that is, a dynamic array of TWordRec
s.
But how to efficiently populate such an array? We all know you shouldn't increase the length of a dynamic array one element at a time; besides, that requires a lot of code. Instead, I populate a local TList<TWordRec>
and then, as a last step, create an array from it:
type
TPhraseMatch = record
Position: Integer;
Text: string;
end;
function GetPhraseMatches(const AText, APhrase: string): TArray<TPhraseMatch>;
begin
var TextLower := AText.ToLower;
var PhraseLower := APhrase.ToLower;
var List := TList<TPhraseMatch>.Create;
try
var p := 0;
repeat
p := Pos(PhraseLower, TextLower, p + 1);
if p <> 0 then
begin
var Match: TPhraseMatch;
Match.Position := p - 1 {since the OP wants 0-based string indexing};
Match.Text := Copy(AText, p, APhrase.Length);
List.Add(Match);
end;
until p = 0;
Result := List.ToArray;
finally
List.Free;
end;
end;
Notice that I chose an alternative to the regular expression approach, just for educational reasons. I also believe this approach is faster. Also notice how easy it is to work with the TList<TWordRec>
: it's just like a TStringList
but with word records instead of strings!
Now, let's use this function:
procedure TWordFinderForm.ePhraseChange(Sender: TObject);
begin
lbMatches.Items.BeginUpdate;
try
lbMatches.Items.Clear;
for var Match in GetPhraseMatches(mText.Text, ePhrase.Text) do
lbMatches.Items.Add(Match.Position.ToString + ':'#32 + Match.Text)
finally
lbMatches.Items.EndUpdate;
end;
end;
Had I not chosen to use a function, but placed all code in one block, I could have iterated over the TList<TWordRec>
in exactly the same way:
for var Match in List do