Search code examples
emailmime

The precise format of Content-Id header


I'm really confused when it comes to the format of Content-Id headers in message parts.

It seems to me that only RFC 2045 covers the format of the header, however briefly:

In constructing a high-level user agent, it may be desirable to allow one body to make reference to another. Accordingly, bodies may be
labelled using the "Content-ID" header field, which is syntactically
identical to the "Message-ID" header field:

 id := "Content-ID" ":" msg-id

Like the Message-ID values, Content-ID values must be generated to be world-unique.

RFC 2822 explains the format of a msg-id token like so:

The message identifier (msg-id) is similar in syntax to an angle-addr construct without the internal CFWS.

message-id = "Message-ID:" msg-id CRLF

in-reply-to = "In-Reply-To:" 1*msg-id CRLF

references = "References:" 1*msg-id CRLF

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

id-left = dot-atom-text / no-fold-quote / obs-id-left

id-right = dot-atom-text / no-fold-literal / obs-id-right

no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE

no-fold-literal = "[" *(dtext / quoted-pair) "]"

Long story short: it includes the at ('@') symbol, just like the Message-Id header of a message. However, almost all reader-friendly articles on MIME format give examples of Content-Id without the at symbol (including not-really-global identifiers like myimagecid or inlineimage001 as well as randomly generated UUIDS without the at symbol). They would surely stress the importance of the '@' symbol if that would be necessary, just like they do with the Message-Id header, right? Right?

I've run some tests on real-world email clients and see how they compose emails with embedded inline images:

  • Thunderbird generates identifiers with the at symbol. Example: part1.12345678.12345678@domain.example.com
  • Gmail generates identifiers without such symbol and with no domain part. Example: ii_abc1234x0_12345ab12abcdefa

I didn't test any more email clients (if someone did, it'd be great to complete the list above), but these two already show the striking difference. Google not obeying RFC standards? It sure looks smelly and I want to know whether that's because I missed something, or because the format isn't really that important after all (which in the long run feels rather disturbing). I'm also interested in checking how many popular email clients actually discard the 'at' symbol.


Solution

  • Go by what the spec says, not by what some mail clients do.

    So yes, a Content-Id header should have a value that conforms to the way the specification says and therefor should have an '@' symbol.

    The world of email is a broken hell hole of many different mail clients and servers doing their own thing and not respecting the standards.

    As someone who has written mail software for the past 17 years, I can assure you, this is not the only place that Google deviates from the specs.