Search code examples
delphifilestreams

TFileStream: Read block of data from file and replace it in Delphi 7


I would like to read a block of bytes from file (e.g. 400KB ), replace some text in the buffer and then write it to file. Originally I tried TFileStream with buffer array of bytes but then I stuck on the problem that stringreplace works with string. The source data are txt UTF-8. This is what I have:

var
  SS,ST: TFileStream;
  Buffer: string;
  sf,tf,TempStr: string;
  i: Integer;
begin
  sf := 'U:\SYSTEM\enwiktionary-latest-stub-articles\stub-articles.xml';
  tf := 'A:1.txt';
  SS := TFileStream.Create(sf, fmOpenRead);
  ST := TFileStream.Create(tf, fmCreate or fmOpenWrite);
  try
    SS.Read(Buffer, sizeof(Buffer));
    Buffer := stringreplace(Buffer, '<page>','<p>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</page>','</p>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<title>','<t>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</title>','</t>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<ns>','<n', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</ns>','>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<revision>','<r>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</revision>','</r>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<id>','<i', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</id>','>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<parentid>','<pi', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</parentid>','>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<contributor>','', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</contributor>','', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<username>','<u>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</username>','</u>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<comment>','<c>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '</comment>','</c>', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '<text id="','<t =', [rfReplaceAll]);
    Buffer := stringreplace(Buffer, '" bytes="',' b=', [rfReplaceAll]);
    ST.Write(Buffer, sizeof(Buffer));
  finally
    SS.Free;
  end;

Buffer := stringreplace makes Runtime error Access violation..


Solution

  • Use UTF8 buffer string and allocate place for string body:

     Buffer: AnsiString; //type UTF8String = AnsiString;
     ...
     SetLength(Buffer, BlockSize)
     SS.Read(PAnsiChar(Buffer)^, BlockSize);
     ...
     ST.Write(PAnsiChar(Buffer)^, Length(Buffer)); 
    

    But with this approach you can loose patterns at the borders of blocks. Why not use TStringList, load all contents in it and work with its lines?