Search code examples
arraysstringdelphidelphi-7

Parsing a text in Delphi


I have a textfile that has the following data:

dgm P1  
s0:->b1  
*s1:b2->b1  
S2:b2->b1,b3  
dgm P2  
s0:->b2  
*s1:b1,b3->b2

I want to parse this file to get an array whose element will contain each of the dgm's till the next one. That is, the first element will be:

dgm P1  
s0:->b1  
*s1:b2->b1  
S2:b2->b1,b3

The second element will be:

dgm P2  
s0:->b2  
*s1:b1,b3->b2

etc. Please how do i go about that in Delphi. I am looking for a better way to do this. I tried loading from the file to TStringList.

begin
str:=TstringList.Create;
try
str.LoadFromFile('example.txt');
for i:=0 to str.Count -1 do
if str[i] ='dgm' then
 //get the position, add it to an array;
 //get the next position, till the end;
 //use the positions to divide up the string

 finally
 str.Free;

However, this is not working and I also think there might be a better way to handle this than I briefly outlined.


Solution

  • AS. This answer uses features of Delphi 2010+ because it was written before the topicstarter specified his target Delphi version. Still this code can be the skeleton for his own implementation using libraries and language features he has available.

    function ParseDgmStringsList( const str: TStrings ): TArray<TArray<String>>;
    var
      s: string;
      section: TList<String>;
      receiver: TList<TArray<String>>;
    
      procedure FlushSection;
      begin
        if section.Count > 0 then begin
           receiver.Add( section.ToArray() );
           section.Clear;
        end;
      end;
    begin
      section := nil;
      receiver := TList<TArray<String>>.Create;
      try
        section := TList<String>.Create;
    
        for s in str do begin
          if StartsText('dgm ', s) then // or StartsStr
             FlushSection;   
          section.Add( s );
        end;
    
        FlushSection;
        Result := receiver.ToArray();
      finally
        receiver.Destroy;
        section.Free;
      end;
    end;
    

    http://docwiki.embarcadero.com/Libraries/Seattle/en/System.Generics.Collections.TList_Properties

    PS. Note that "using AnsiContainsStr(str,'dgm')" is fragile and hardly correct - it will generate false positive at lines like S2:b2->bcdgmaz,b3. You should check that dgm starts the string and that it is a separate word rather than part of some random longer word (in other words search for 'dgm' + #32 instead of mere 'dgm'

    PPS. Another thing to consider is how would you handle files that starts with non-dgm lines? What would you do with empty lines, indented lines? For example how would you parse the file like that?

    s8:->b2  
    ;*s1:b1,b3->b2
    dgm P1  
    s0:->b1  
    *s1:b2->b1  
    
    S2:b2->b1,b3  
        dgm P2  
      s0:->b2  
    *s1:b1,b3->b2