Search code examples
delphifiremonkey

Can we safely use ansiString in mobile with Sydney?


When I read Migrating Delphi Code to Mobile from Desktop, they say to avoid using AnsiString. Is there any reason for that? AnsiString use 2x less memory than UnicodeString, and it's a perfect container for JSON. So, can I use AnsiString safely, or do I need to stay with UnicodeString (and why)?


Solution

  • You can use 8-bit strings on mobile platforms. But safety depends on which kind of 8-bit strings you use.

    For anything other than Windows, and even on Windows, using AnsiString is extremely bad idea. AnsiString is legacy type and while it was re-enabled in 10.4 on mobile platforms, that does not mean you should use it, and even less that you can use it safely.

    One of the problems with AnsiString is that sooner or later in your code it will go through conversion, because default string type used all over RTL and FMX is UTF-16 string type, and you can lose original data.

    String types you can safely use on mobile (and other platforms) are string, UTF8String and RawByteString.

    When it comes to RawByteString it can only be safely used in code-page agnostic operations. See more: Delphi XE - RawByteString vs AnsiString

    JSON files don't support ANSI encoding, so Unicode is your only choice. UTF-8 and UTF8String will do more than fine, because that is also default encoding for any JSON data exchange.

    As far as various AnsiXXX functions are concerned, the best option is to write your own routines that will work on UTF-8 strings. You can also use standard functions that work on generic string type, but they are slower because of conversions to UTF-16 and back.


    Illustration of data loss when using AnsiString on mobile (Android)

    Android specification requires implementation of only few standard character charsets. That includes ISO-8859-1

    https://developer.android.com/reference/java/nio/charset/Charset

    For anything else you depend on the specific device.

    For instance following example with AnsiString works fine for French character set, but it fails for Croatian and Chinesse.

    var
      s: string;
      u: UTF8String;
      a: AnsiString;
    begin
      s := 'é à è ù â ê î ô û ç ë ï ü';
      a := s;
      u := s;
      Memo1.Lines.Add(s);
      Memo1.Lines.Add(u);
      Memo1.Lines.Add(a);
    
      s := 'š đ č ć ž Š Đ Č Ć Ž';
      a := s;
      u := s;
      Memo1.Lines.Add(s);
      Memo1.Lines.Add(u);
      Memo1.Lines.Add(a);
    
      s := '新年';
      u := s;
      a := s;
      Memo1.Lines.Add(s);
      Memo1.Lines.Add(u);
      Memo1.Lines.Add(a);
    end;
    

    different character encodings displayed on Android device

    Delphi compiler will issue a warning when you are doing unsafe typecasting between where data loss can occur, and it is prudent to fix all that code, by using some other string type.

    W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
    

    There is also a warning when you directly convert between UTF-8 and UTF-16 string types, but to clear those warnings you can just explicitly typecast to string or UTF8String type, since compiler will do appropriate conversion in the background and all information will be retained (Note: Unicode normalization my occur during that process).

    W1057 Implicit string cast from 'string' to 'UTF8String'