I would like to read a block of bytes from file (e.g. 400KB ), replace some text in the buffer and then write it to file. Originally I tried TFileStream with buffer array of bytes but then I stuck on the problem that stringreplace works with string. The source data are txt UTF-8. This is what I have:
var
SS,ST: TFileStream;
Buffer: string;
sf,tf,TempStr: string;
i: Integer;
begin
sf := 'U:\SYSTEM\enwiktionary-latest-stub-articles\stub-articles.xml';
tf := 'A:1.txt';
SS := TFileStream.Create(sf, fmOpenRead);
ST := TFileStream.Create(tf, fmCreate or fmOpenWrite);
try
SS.Read(Buffer, sizeof(Buffer));
Buffer := stringreplace(Buffer, '<page>','<p>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</page>','</p>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<title>','<t>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</title>','</t>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<ns>','<n', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</ns>','>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<revision>','<r>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</revision>','</r>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<id>','<i', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</id>','>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<parentid>','<pi', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</parentid>','>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<contributor>','', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</contributor>','', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<username>','<u>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</username>','</u>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<comment>','<c>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '</comment>','</c>', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '<text id="','<t =', [rfReplaceAll]);
Buffer := stringreplace(Buffer, '" bytes="',' b=', [rfReplaceAll]);
ST.Write(Buffer, sizeof(Buffer));
finally
SS.Free;
end;
Buffer := stringreplace
makes Runtime error Access violation..
Use UTF8 buffer string and allocate place for string body:
Buffer: AnsiString; //type UTF8String = AnsiString;
...
SetLength(Buffer, BlockSize)
SS.Read(PAnsiChar(Buffer)^, BlockSize);
...
ST.Write(PAnsiChar(Buffer)^, Length(Buffer));
But with this approach you can loose patterns at the borders of blocks. Why not use TStringList
, load all contents in it and work with its lines?