Search code examples
imap

IMAP - rule for differentiating between inline and regular attachments


I am working on an email client, and I wonder what is the correct algorithm for deciding whether an attachment is a regular attachment (a downloadable file like pdf, video, audio, etc...) or an inline attachment (which is just an embedded part of an HTML letter). Until recently, I've checked whether body type (assuming the message part is not multipart, otherwise I would recursively parse ir further) is not TEXT. That is, whether it's APPLICATION, IMAGE, AUDIO or VIDEO. If that's the case, I looked at whether the nineth element is equal to ATTACHMENT or INLINE. I thought that if it's INLINE, then it is an embedded HTML particle, rather than a regular attachment.

However, recently I have across an email that contained some HTML message body and regular attachments. The problem is that its body structure looked like this:

1. mutlipart/mixed
   1.1. mutlipart/alternative
        1.1.1. text/plain
        1.1.2. multipart/relative
               1.1.2.1. text/html
               1.1.2.2. Inline jpeg
               1.1.2.3. Inline jpeg
   1.2. pdf inline (why 'inline'? Should be 'attachment')
   1.3. pdf inline (why 'inline'? Should be 'attachment')

The question is, why downloadable pdf files are of type INLINE? And what is the appropriate algorithm for determining whether a file is embedded html particle or a downloadable file? Should I look at the parent subtype to see whether it's relative or not and disregard inline vs attachment parameters?


Solution

  • There really is no defined one-size-fits-all algorithm. inline or attachment is something the sender sets, and is a hint on whether they want it to be displayed inline (automatically rendered), as an attachment (displayed in a list), or neither (no preference).

    There is also what is sometimes called "embedded" attachments, which are attachments with a Content-ID (this is in the body structure response) and is referenced by a cid: reference in an <img> tag or the like.

    So, this pretty much has to be done heuristically.

    It really depends on your needs and your clients capabilities, but here is a list of heuristics you may consider using in some combination (some of these are mutually exclusive):

    1. If it is marked 'attachment', treat it as an attachment.
    2. If it is marked inline, and it is something you can treat as inline (image/*, maybe text/* if you like), then it is inline.
    3. If it has a Content-ID, treat it inline.
    4. If it has a Content-ID, and the HTML section references it, treat it as embedded (that is, the HTML viewer will render it); If it was not referenced, treat it as inline (or attachment) as your requirements dictate.
    5. If it is neither, and it is something you want to treat as inline, then treat it as inline.
    6. If nothing applies, treat it as an attachment.
    7. Ignore the disposition, and treat it as inline if you wish (such as making all images always inline)

    Also, the original version of inline only meant the sender wanted it automatically rendered; this is often conflated with referenced by the HTML section (which I've called embedded). These are not quite the same.