Search code examples
stringreplacelazarusfpcdelphi

Lazarus: StringReplace ineffective when working with files (unicode issue)


I'm using Lazarus to build a simple app that builds Outlook signatures based on a template. The idea is to extract the template (a ZIP file), and replace variables within the files it contains.

For example, I may want to replace {fullname} with the name provided by the user.

I am currently using the implementation below, but it seems to be ineffective. The file is read and written to, but it appears the replacements are not being made. I have tested to see if my implementation of TFileStream is not correct, but using WriteAnsiString to append dummy text onto the end of the output file works.

Please would you kindly have a look at my code below and let me know what I may have done wrong, or if there are any better alternatives to StringReplace? I am aware that one can use TStringList - however, doing so breaks line endings. As memos and rich edits use TStringList, using those won't help either.

Update:

I have seen this, but using AnsiString makes no difference. If I'm not mistaken, FPC uses it by default anyway, instead of UnicodeString.

Update 2:

Indeed, AnsiString is the default. Using a unicode string (which makes the replacements work) adds ? to the beginning and end of the file. Why would it do that?


function multiStringReplace(const s: string; search, replace : array of string; flags : tReplaceFlags): string;
var c : cardinal;
begin
    assert(length(search) = length(replace), 'Array lengths differ.');
    result := s;
    for c := low(search) to high(search) do
        result := stringReplace(result, search[c], replace[c], flags);
end;
procedure fileReplaceString(const fileName: string; search, replace: array of string);
var
    fs: tFileStream;
    s: string;
begin
    fs := tFileStream.create(fileName, fmOpenRead or fmShareDenyNone);
    try
        setLength(s, fs.size);
        fs.readBuffer(s[1], fs.size);
    finally
        fs.free();
    end;
    s := multiStringReplace(s, search, replace, [rfReplaceAll, rfIgnoreCase]);
    fs := tFileStream.create(fileName, fmOpenWrite);
    try
        fs.writeBuffer(s[1], length(s));
    finally
        fs.free();
    end;
end;

Usage:

fileReplaceString(currentFile, ['{fullname}'], ['Full Name']);

Solution

  • Thanks to Abelisto's comment above, it appears the issue is due to the fact that Outlook saves the three files it creates with different encodings. To get around it, I simply used convertEncoding and guessEncoding from lconvencoding, as below:

    uses
        lconvencoding;
    
    // Read string
    s := convertEncoding(
        multiStringReplace(s, search, replace, [rfReplaceAll, rfIgnoreCase]),
        guessEncoding(s), encodingAnsi
    );
    // Write modified and converted string back to file
    

    encodingAnsi appears to be the best conversion, at least in my case. Converting to UTF8 (with or without BOM) caused a bit of a headache with certain characters, specifically EmDash or EnDash.