Search code examples
delphidelphi-7

How to effectively check if a string contains one of a few sub strings?


How can I effectively check if a string contains one of a few sub strings? Suppose I have a string:

`Hi there, <B>my</B> name is Joe <DIV>.</DIV> Hello world. &nbsp;`

How can I check if the string contains either <B> OR <DIV> OR &nbsp;?

I could do a simple:

Result := (Pos('<B>', S) > 0) or 
          (Pos('<DIV>', S) > 0) or 
          (Pos('&nbsp;', S) > 0);

But this seems to be very inefficient since it make N (at worst) passes and my strings are considerably large.


Solution

  • Slightly better version:

    function StringContainsAny(const S: string; const AnyOf: array of string): Boolean;
    var
      CurrChr, C: PChar;
      i, j, Ln: Integer;
    begin
      for i := 1 to Length(S) do
      begin
        CurrChr := @S[i];
        for j := 0 to High(AnyOf) do
        begin
          C := @AnyOf[j][1]; // assume that no empty strings
          if C^ <> CurrChr^ then
            Continue;
    
          Ln := Length(AnyOf[j]);
          if (Length(S) + 1 - i) < Ln then // check bounds
            Continue;
    
          if CompareMem(C, CurrChr, Ln * SizeOf(C^)) then
            Exit(True);
        end;
      end;
    
      Exit(False);
    end;
    

    You can also build some table of stop-symbols and improve speed. It's kinda complex topic, so I can just suggest you to read, for example, book Bill Smyth "Computing Patterns in Strings".