Search code examples
delphiemailcharacter-encodingindy

How to send an email containing Greek characters using TIdMessage and Delphi XE *UPDATED*


We want to send through email, using D-XE and Indy's TIdMessage component the following htm file as body:

<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1253">
<meta name=Generator content="Microsoft Word 12 (filtered)">
<style>
<!--
 /* Font Definitions */
 @font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman","serif";
    color:black;}
.MsoChpDefault
    {font-size:10.0pt;}
@page Section1
    {size:595.3pt 841.9pt;
    margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
    {page:Section1;}
-->
</style>

</head>

<body bgcolor=white lang=EL>

<div class=Section1>

<p class=MsoNormal><span lang=EN-US style='font-family:"Tahoma","sans-serif"'>Abcd</span><span
lang=EN-US style='font-family:"Tahoma","sans-serif"'> </span><span
style='font-family:"Tahoma","sans-serif"'>αβγδ ά&#8118;&#8048;&#7938; </span></p>

</div>

</body>

</html>

(Ok, the actual file is different but the problem is the same).

In the above file, if you'll save it as temp.htm and load it in the Internet Explorer, you'll see 4 latin characters, 4 Greek characters without tone and 4 Greek characters with tone (variations of Alpha - the first letter of Greek alphabet). Something like this:

Abcd αβγδ άᾶὰἂ

So far, so good.

If we load the above file in the Body property of the TIdMessage and send it through email it shows like this:

Abcd ???? ?ᾶὰἂ

As you see, the greek letters from the monotonic alphabet are replaced with ???? ? - tested using Mozilla Thunderbird 3 on WinXP.

The properties of the TIdMessage component are as follows:

TIdMessage Properties

I tried to set the CharSet to Windows-1253 but no luck.

Any ideas how this can work?

UPDATE:

Answering your questions:

The raw message source after it was received is: (the email addresses were redacted)

From - Thu Sep 15 11:11:06 2011
X-Account-Key: account3
X-UIDL: 00007715
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00400000
X-Mozilla-Keys:                                                                                 
Return-Path: [redacted]
X-Envelope-To: [redacted]
X-Spam-Status: No, hits=0.0 required=5.0
    tests=AWL: 0.194,BAYES_20: -0.73,HTML_MESSAGE: 0.001,
    MIME_HEADER_CTYPE_ONLY: 0.56,MIME_HTML_ONLY: 0.001,MISSING_MID: 0.001,
    CUSTOM_RULE_FROM: ALLOW,TOTAL_SCORE: 0.027,autolearn=no
X-Spam-Level: 
Received: from localhost ([127.0.0.1])
    by [redacted]
    for [redacted];
    Thu, 15 Sep 2011 11:10:59 +0300
From: [redacted]
Subject: Test msg
To: [redacted]
Content-Type: text/html; charset=us-ascii
Sender: [redacted]
Reply-To: [redacted]
Disposition-Notification-To: [redacted]
Return-Receipt-To: [redacted]
Date: Thu, 15 Sep 2011 11:10:59 +0300

<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1253">
<meta name=Generator content="Microsoft Word 12 (filtered)">
<style>
<!--
 /* Font Definitions */
 @font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman","serif";
    color:black;}
.MsoChpDefault
    {font-size:10.0pt;}
@page Section1
    {size:595.3pt 841.9pt;
    margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
    {page:Section1;}
-->
</style>

</head>

<body bgcolor=white lang=EL>

<div class=Section1>

<p class=MsoNormal><span lang=EN-US style='font-family:"Tahoma","sans-serif"'>Abcd</span><span
lang=EN-US style='font-family:"Tahoma","sans-serif"'> </span><span
style='font-family:"Tahoma","sans-serif"'>???? ?&#8118;&#8048;&#7938; </span></p>

</div>

</body>

</html>

Mozilla Thunderbird says also Message Encoding: Western (ISO-8859-1). I tried to put in the IdMessage component different encodings like windows-1253 (Greek) or UTF-8 - the result was the same. Also, I tried to convert the htm file to UTF-8 (using the Notepad++) - it looked the same (I changed the charset by hand in the html's meta info). Sent the message again. The result: Abcd ???2?3?? ??ᾶὰἂ


Solution

  • If you look at your own screenshots, you will see that TIdMessage and the transmitted email are both set to use US-ASCII as the CharSet. That is why your data is getting altered.

    If you load the HTML into the TIdMessage.Body or TIdText.Body property, you have to decode the data to UTF-16 (since that is what the Body property uses in XE) and then set the TIdMessage.CharSet or TIdText.CharSet property to windows-1253 so the UTF-16 data gets re-encoded properly when the email is sent, eg:

    Enc := CharsetToEncoding('windows-1253');
    try
      IdMessage.Body.LoadFromFile('file.htm', Enc);
      IdMessage.ContentType := 'text/html';
      IdMessage.CharSet := 'windows-1253';
    finally
      Enc.Free;
    end;
    

    Or:

    Enc := CharsetToEncoding('windows-1253');
    try
      with TIdText.Create(IdMessage.MessageParts, nil) do
      begin
        Body.LoadFromFile('file.htm', Enc);
        ContentType := 'text/html';
        CharSet := 'windows-1253';
      end;
    finally
      Enc.Free;
    end;
    

    If you load the HTML into a TIdAttachment object instead, then you don't have to decode/encode anything manually, since the attachment data is sent as-is.

    with TIdAttachmentFile.Create(IdMessage.MessageParts, 'file.htm') do
    begin
      ContentType := 'text/html';
    end;