I am trying to encode the 'subject' field, written in Hebrew, of an email into Base64 so that the subject can be read correctly in all browsers. At the moment, I am using the encoding Windows-1255 which works on some clients but not all, so I want to use utf-8, base64.
My reading on the subject (no pun intended) shows that the text has to be in the form
=?<charset>?<encoding>?<encoded text>?=
eg
=?windows-1255?Q?=E0=E1?=
I have taken encoded subject lines from letters which were sent to me in Hebrew with UTF-8B encoding and decoded them successfully on this website, www.webatic.com/run/convert/base64.php. I have also used this website to encode simple letters and have noticed that the return encoding is not the same as the result which I get from a Delphi algorithm.
So - I am looking for an algorithm which successfully encodes letters such as aleph (ord=224), bet (ord=225), etc. According to the website, the string composed of the two letters aleph and bet returns the code 15DXkq==, but the basic Delphi algorithm returns Ue4 and the TIdEncoderQuotedPrintable component returns =E0=E1 (which is the ISO-8859 encoding).
Edit (after several comments):
I asked a friend to send me an email from her Mac computer, which unsurprisingly uses UTF-8 encoding (as opposed to Windows-1255). The subject was one letter, aleph, ord 224. The encoded subject appeared in the email's header as follows
=?UTF-8?B?15A=?=
This can be separated into three parts: the 'prefix' (=?UTF-8?B?) which means that UTF-8 with base64 encoding is being used; the 'payload' (15A=), which the web site which I quoted translates this correctly as the letter aleph; and the suffix (?=).
I need an algorithm to translate an arbitrary string of letters, most of which will be in Hebrew (and thus with ord >= 224) into base64/utf-8; a correct solution is one that decodes correctly on the web site quoted.
You do not need to encode the Subject
property manually at all. TIdMessage
encodes it automatically for you. Simply assign the
Edit1.Text
value as-is to the
Subject
and let TIdMessage
encode
it as needed.
If you want to customize how
TIdMessage
encodes headers, use the TIdMessage.OnInitializeISO
event to provide the desired charset and encoding
values. In Delphi 2009+, it defaults to UTF-8 and Base64. In earlier versions, TIdMessage
reads the RTL's current OS language and chooses some default values for known languages. However, Hebrew is not one of them, and so ISO-8859-1 and QuotedPrintable would end up being used. You can override those values, eg:
email.Subject := Edit1.Text;
.
procedure TForm1.emailInitializeISO(var VHeaderEncoding: Char; var VCharSet: string);
begin
VHeaderEncoding := 'B';
VCharSet := 'UTF-8';
end;