Search code examples
lazarusfreepascalansistring

Convert AnsiString to UnicodeString in Lazarus with FreePascal


I found similar topics here but none of them had the solution to my question, so I am asking it in a new thread.

Couple of days ago, I changed the format the preferences of an application I am developing is saved, from INI to JSON.

I use the jsonConf unit for this.

A sample of the code I use to save a key-value pair in the file would be like below.

Procedure TMyClass.SaveSettings();
var
  c: TJSONConfig;
begin
  c:= TJSONConfig.Create(nil);
  try
    c.Filename:= m_settingsFilePath; 
    c.SetValue('/Systems/CustomName', m_customName);
  finally
    c.Free;
  end;
end;

In my code, m_customName is an AnsiString type variable. TJSONConfig.SetValue procedure requires the key and value both to be of UnicodeString type. The application compiles fine, but I get warnings such

Warning: Implicit strung type conversion from "AnsiString" to "UnicodeString". 

Some messages warn saying there is a potential data loss.

Of course I can go and change everything to UnicodeString type but this is too risky. I have't seen any issues so far by ignoring these warnings, but they show up all the time and it might cause issues on a different PC.

How do I fix this?


Solution

  • To avoid the warning do an explicit conversion because this way you tell the compiler that you know what you are doing (I hope...). In case of c.SetValue the expected type is a Unicodestring (UTF16), m_customname should be declared as a string unless there is good reason to do differently (see below), otherwise you may trigger unwanted internal conversions.

    A string in Lazarus is UTF8-encoded, by default. Therefore, you can use the function UTF8Decode() for the conversion from UTF8 to Unicode, or UTF8ToUTF16() (unit LazUtf8).

    var
      c: TJSONConfig;
      m_customName: String;
    ...
      c.SetValue('/Systems/CustomName', UTF8Decode(m_customName));
    

    You say above that the key-value pairs are in a file. Then the conversion depends on the encoding of the file. Normally I open the file in a good text editor and find the encoding somewhere - NotePad++, for example, displays the name of the encoding in the right corner of the statusbar. Suppose the encoding is that of codepage 1252 (Latin-1). These are ansistrings, therefore, you can declare the strings read from the file as ansistring. Because UTF8 strings are so common in Lazarus there is no direct conversion from ansistring to Unicode, and you must convert to UTF8 first. In the unit lconvencoding you find many conversion routines between various encodings. Select CP1252toUTF8() to go to UTF8, and then apply UTF8Decode() to finally get Unicode.

    var
      c: TJSONConfig;
      m_customName: ansistring;
    ...
      c.SetValue('/Systems/CustomName', UTF8Decode(CP1252ToUTF8(m_customName)));
    

    The FreePascal compiler 3.0 can handle many of these conversions automatically using strings with predefined encodings. But I think explicit conversions are very clear to see what is happening. And fpc3.0 still emits the warnings which you want to avoid...