Search code examples
delphidelphi-7

Delphi 7 and decode UTF-8 base64


In Delphi 7, I have a widestring encoded with Base64(That I received from a Web service with WideString result) :

PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==

when I decoded it, that result is not UTF-8:

<?xml version="1.0"?>
<string>طھط³طھ</string>

But when I decoded it by base64decode.org, result is true :

<?xml version="1.0"?>
<string>تست</string>

I have use EncdDecd unit for DecodeString function.


Solution

  • The problem you have is that you are using DecodeString. That function, in Delphi 7, treats the decoded binary data as being ANSI encoded. And the problem is that your text is UTF-8 encoded.

    To continue with the EncdDecd unit you have a couple of options. You can switch to DecodeStream. For instance, this code will produce a UTF-8 encoded text file with your data:

    {$APPTYPE CONSOLE}
    
    uses
      Classes,
      EncdDecd;
    
    const
      Data = 'PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==';
    
    var
      Input: TStringStream;
      Output: TFileStream;
    
    begin
      Input := TStringStream.Create(Data);
      try
        Output := TFileStream.Create('C:\desktop\out.txt', fmCreate);
        try
          DecodeStream(Input, Output);
        finally
          Output.Free;
        end;
      finally
        Input.Free;
      end;
    end.
    

    Or you could continue with DecodeString, but then immediately decode the UTF-8 text to a WideString. Like this:

    {$APPTYPE CONSOLE}
    
    uses
      Classes,
      EncdDecd;
    
    const
      Data = 'PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==';
    
    var
      Utf8: AnsiString;
      wstr: WideString;
    
    begin
      Utf8 := DecodeString(Data);
      wstr := UTF8Decode(Utf8);
    end.
    

    If the content of the file can be represented in your application's prevailing ANSI locale then you can convert that WideString to a plain AnsiString.

    var
      wstr: WideString;
      str: string; // alias to AnsiString
    ....
    wstr := ... // as before
    str := wstr;
    

    However, I really don't think that using ANSI encoded text is going to lead to a very fruitful programming life. I encourage you to embrace Unicode solutions.

    Judging by the content of the decoded data, it is XML. Which is usually handed to an XML parser. Most XML parsers will accept UTF-8 encoded data, so you quite probably can base64 decode to a memory stream using DecodeStream and then hand that stream off to your XML parser. That way you don't need to decode the UTF-8 to text and can let the XML parser deal with that aspect.