This is an algorithmic question; any pseudocode/verbal explanation will do (although a Python solution outline would be ideal).
Let's have a query word A
, for example pity
. And let's have a set of other strings, B
s, each of which is composed of one or more space-delimited words: pious teddy
, piston tank yard
, pesky industrial strength tylenol
, oh pity is me!
etc.
The goal is to identify those strings B
from which A
could be constructed. Here "constructed" means we could take prefixes of one or more words in B
, in order, and concatenate them together to get A
.
Example:
On the other hand, pious teddy
should not be identified, because there's no way we can take prefixes of the words pious
and teddy
and concatenate them into pity
.
The checking should be fast (ideally some regexp), because the set of strings B is potentially large.
You can use a pattern of \bp(?:i|\w*(?>\h+\w+)*?\h+i)(?:t|\w*(?>\h+\w+)*?\h+t)(?:y|\w*(?>\h+\w+)*?\h+y)
to match those words. It assumes spaces to be used as word separators. This is rather easy to construct, just take the first letter of your word to be matched, then loop over the others and construct (?:[letter]|\w*(?>\h+\w+)*?\h+[letter])
from them.
This pattern is basically an unrolled version of \bp(?:i|.*?\bi)(?:t|.*?\bt)(?:y|.*?\by)
, which matters the second to last letters either as exactly next letter or first letter (because of the word boundary) of a next word.
You can see it in action here: https://regex101.com/r/r3ZVNE/2
I have added the last sample as non-matching one for some tests I did with atomic groups.
In Delphi I would do it like this:
program ProjectTest;
uses
System.SysUtils,
RegularExpressions;
procedure CheckSpecialMatches(const Matchword: string; const CheckList: array of string);
var
I: Integer;
Pat, C: string;
RegEx: TRegEx;
begin
assert(Matchword.Length > 0);
Pat := '\b' + Matchword[1];
for I := Low(Matchword) + 1 to High(Matchword) do
Pat := Pat + Format('(?:%0:s|\w*(?>\h+\w+)*?\h+%0:s)', [Matchword[I]]);
RegEx := TRegEx.Create(Pat, [roCompiled, roIgnoreCase]);
for C in CheckList do
if Regex.IsMatch(C) then
WriteLn(C);
end;
const
CheckList: array[0..3] of string = (
'pious teddy',
'piston tank yard',
'pesky industrial strength tylenol',
'prison is ity');
Matchword = 'pity';
begin
CheckSpecialMatches(Matchword, CheckList);
ReadLn;
end
.