Search code examples
c#smtpmimeadodbcdo.message

CDO.Message.Fields[].Name contains weird characters


I'm reading an email file where the first line in the file (so first line in the header) is:

X-RCPT-TO-LIST: 1,2,3

I'm loading it using CDO and ADODB like this:

        ADODB.Stream stream = new ADODB.Stream();
        stream.Open(Type.Missing, ADODB.ConnectModeEnum.adModeUnknown, ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified, String.Empty, string.Empty);
        stream.LoadFromFile(filename);
        stream.Flush();
        CDO.Message msg = new CDO.Message();
        msg.DataSource.OpenObject(stream, "_Stream");
        msg.DataSource.Save();

Then I'm trying to get the field like this:

ADODB.Field f = msg.Fields["urn:schemas:httpmail:X-RCPT-TO-LIST"];

Which does not work, it returns an empty field (null values).

Looking at the fields in the debugger, I see that the field name is:

urn:schemas:mailheader:ÿþx-rcpt-to-list

I assume my code might work if I look for those weird characters, but I'm worried they might change from one email to the next. Any ideas why those strange characters are added? Is there a better way to access custom header fields (without reading the file myself and parsing it)?

I'm running this test on Windows XP with all of the latest patches (SP3 I think).

Sorry if I tagged this wrong, I had trouble finding tags for this. I'm using C# if not obvious.

Here is the entire email file, I removed some junk (some for privacy reasons) but I did retest with this exact version and getting same results:

X-RCPT-TO-LIST: 1,2,3
Received: by mail-ia0-f172.google.com with SMTP id l29so4135896iag.3
        for <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.50.195.134 with SMTP id ie6mr6320542igc.6.1364007120542;
 Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Received: by 10.50.169.39 with HTTP; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Date: Fri, 22 Mar 2013 19:52:00 -0700
Message-ID: <XXXXXXXX63pPLB9QYu=04W3mU3Ynhkjf2bdYYZqv5oVvQ__u1vg@mail.gmail.com>
Subject: test4
From: <[email protected]>
To: 423a777e2af27f463b801fe2eb2242cbdf1d934000000001 <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>
Content-Type: multipart/alternative; boundary=14dae9340b45e63f6204d88ea7fa

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/plain; charset=UTF-8

test4

-- 
[email protected]
I don't check *this account* very often

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">test4<br clear=3D"all"><div><br>-- <br><div><a href=3D"mai=
lto:[email protected]" target=3D"_blank">[email protected]</a></div>
<div>I don&#39;t check <b>this account</b> very often</div>
<div>=C2=A0</div>
</div></div>

--14dae9340b45e63f6204d88ea7fa--

The X-RCPT-TO-LIST line is added by code in my email server that translates the RCPT TO:<> lines to internal user IDs. That way my thread that processes these files later knows where to place the mail. I don't want to keep the info in a separate file or anything like that, as I like my current design, I just want to know why CDO/ADODB is translating my message header in to some weird name, like a mix-match of Unicode vs ASCII or something goofy.


Solution

  • "ÿþ" as first symbols of a text stream are so-called "byte order mark" most of the time. See eg. Wikipedia entry. They appear in a stream because they are in a file being read. BOM must show up if one opens a file with a hex-editor and checks its first bytes. For instance, "ÿþ" is a text representation of 0xFFFE.

    Why are these symbols there in a file in the first place? It depends on how the file was created. This question may appear helpful: Can I export excel data with UTF-8 without BOM?.