Search code examples
delphilazarus

How to Extract sub-string after delimiter but ignore if found between two tags?


I need to extract a sub-string after a specific delimiter, but if the specified delimiter is between two other tags it should be ignored.

For example, take this test string:

The quick <"@brown fox"> jumps over the lazy dog. The quick @brown fox jumps over the lazy dog

The desired output would be:

brown fox jumps over the lazy dog

This is because the first found @ delimiter is between two " " and so should be ignored, the second @ delimiter is not inside " " and so the text afterwards should be extracted.

I am able to find the starting position of the @ delimiter by using Pos and extracting the text to the right of it as shown below:

procedure TForm1.Button1Click(Sender: TObject);
var
  S: string;
  I: Integer;
begin
  S := 'The quick <"@brown fox"> jumps over the lazy dog. The quick @brown fox jumps over the lazy dog';
  I := Pos('@', S);
  if I > 0 then
  begin
    ShowMessage(Copy(S, I, Length(S)));
  end;
end;

However this will always find the first @ delimiter regardless if it is surrounded by two " " or not. The result from the above is:

@brown fox"> jumps over the lazy dog. The quick @brown fox jumps over the lazy dog

where the desired result should be:

brown fox jumps over the lazy dog

How can I change the code to ignore @ delimiters when using Pos if the delimiter is between two " " tags? I only want to find the first @ delimiter and copy the text afterwards.

It also does not matter if there are any other @ delimiters after the first valid one is found, for example this should also be valid:

The quick <"@brown fox"> jumps over the lazy dog. The quick @brown fox jumps@ ov@er the lazy@ dog

Should still return:

brown fox jumps over the lazy dog

Because we are only interested in the first valid @ delimiter, ignoring anything else afterwards and ignoring anything between two " " tags.

Please note although I have tagged Delphi I do primarily use Lazarus so ideally I would need help coming up with a solution that does not use magic help with string helpers etc.

Thanks.


Solution

  • To find out if the @ is not within " enclosing tags, parse the string from the beginning.

    If a delimiter is found after an opening tag, but there is no closing tag, this routine will extract the result as well.

    function ExtractString(const s: String): String;
    var
      tagOpen: Boolean;
      delimiterPos,i,j: Integer;
    begin
      tagOpen := false;
      delimiterPos := 0;
      Result := '';
      for i := 1 to Length(s) do begin
        if (s[i] = '"') then begin
          tagOpen := not tagOpen;
          delimiterPos := 0;
        end
        else begin
          if (s[i] = '@') then begin
            if (delimiterPos = 0) then
              delimiterPos := i;
            if not tagOpen then // Found answer
              Break;
          end;
        end;         
      end;
    
      // If there is no closing tag and a delimiter is found
      // since the last opening tag, deliver a result. 
      if (delimiterPos > 0) then begin
        // Finally extract the string and remove all `@` delimiters.
        SetLength(Result,Length(s)-delimiterPos);
        j := 0;
        for i := 1 to Length(Result) do begin
          Inc(delimiterPos);
          if (s[delimiterPos] <> '@') then begin
            Inc(j);
            Result[j] := s[delimiterPos];
          end;
        end;
        SetLength(Result,j);      
      end;
    end;