When I read Migrating Delphi Code to Mobile from Desktop, they say to avoid using AnsiString
. Is there any reason for that? AnsiString
use 2x less memory than UnicodeString
, and it's a perfect container for JSON. So, can I use AnsiString
safely, or do I need to stay with UnicodeString
(and why)?
You can use 8-bit strings on mobile platforms. But safety depends on which kind of 8-bit strings you use.
For anything other than Windows, and even on Windows, using AnsiString
is extremely bad idea. AnsiString
is legacy type and while it was re-enabled in 10.4 on mobile platforms, that does not mean you should use it, and even less that you can use it safely.
One of the problems with AnsiString
is that sooner or later in your code it will go through conversion, because default string type used all over RTL and FMX is UTF-16 string type, and you can lose original data.
String types you can safely use on mobile (and other platforms) are string
, UTF8String
and RawByteString
.
When it comes to RawByteString
it can only be safely used in code-page agnostic operations. See more: Delphi XE - RawByteString vs AnsiString
JSON files don't support ANSI encoding, so Unicode is your only choice. UTF-8 and UTF8String
will do more than fine, because that is also default encoding for any JSON data exchange.
As far as various AnsiXXX
functions are concerned, the best option is to write your own routines that will work on UTF-8 strings. You can also use standard functions that work on generic string type, but they are slower because of conversions to UTF-16 and back.
Illustration of data loss when using AnsiString
on mobile (Android)
Android specification requires implementation of only few standard character charsets. That includes ISO-8859-1
https://developer.android.com/reference/java/nio/charset/Charset
For anything else you depend on the specific device.
For instance following example with AnsiString
works fine for French character set, but it fails for Croatian and Chinesse.
var
s: string;
u: UTF8String;
a: AnsiString;
begin
s := 'é à è ù â ê î ô û ç ë ï ü';
a := s;
u := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
s := 'š đ č ć ž Š Đ Č Ć Ž';
a := s;
u := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
s := '新年';
u := s;
a := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
end;
Delphi compiler will issue a warning when you are doing unsafe typecasting between where data loss can occur, and it is prudent to fix all that code, by using some other string type.
W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
There is also a warning when you directly convert between UTF-8 and UTF-16 string types, but to clear those warnings you can just explicitly typecast to string
or UTF8String
type, since compiler will do appropriate conversion in the background and all information will be retained (Note: Unicode normalization my occur during that process).
W1057 Implicit string cast from 'string' to 'UTF8String'