Search code examples
c#emailmodel-view-controllerimapmailkit

how to distinguish between inline image and signature and other blank images in email Imap


I'm using Mailkit to fetch email from mailbox and save it to database to display in my MVC application.

I save html email as plain text in database, i can fetch attachments and save it in file system, but when there are inline images in email, i'm having issue as signatures and other blank images are too being saved as attachment in file system.

Is there a way to distinguish between inline attachment and signatures or other blank images?

Thanks in advance


Solution

  • It doesn't matter which IMAP library you use, none of them have a feature that will help you do what you want to do because it's a non-trivial problem to solve that you are going to need to use some ingenuity to solve.

    What you can do is start with the HtmlPreviewVisitor sample from the FAQ and modify it every-so-slightly to just split the attachments into 2 lists:

    1. The list of actual attachments
    2. The list of images actually referenced by the HTML (by walking the HTML and tracking which images are referenced)

    code:

    /// <summary>
    /// Visits a MimeMessage and splits attachments into those that are
    /// referenced by the HTML body vs regular attachments.
    /// </summary>
    class AttachmentVisitor : MimeVisitor
    {
        List<MultipartRelated> stack = new List<MultipartRelated> ();
        List<MimeEntity> attachments = new List<MimeEntity> ();
        List<MimePart> embedded = new List<MimePart> ();
        bool foundBody;
    
        /// <summary>
        /// Creates a new AttachmentVisitor.
        /// </summary>
        public AttachmentVisitor ()
        {
        }
    
        /// <summary>
        /// The list of attachments that were in the MimeMessage.
        /// </summary>
        public IList<MimeEntity> Attachments {
            get { return attachments; }
        }
    
        /// <summary>
        /// The list of embedded images that were in the MimeMessage.
        /// </summary>
        public IList<MimePart> EmbeddedImages {
            get { return embedded; }
        }
    
        protected override void VisitMultipartAlternative (MultipartAlternative alternative)
        {
            // walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
            for (int i = alternative.Count - 1; i >= 0 && !foundBody; i--)
                alternative[i].Accept (this);
        }
    
        protected override void VisitMultipartRelated (MultipartRelated related)
        {
            var root = related.Root;
    
            // push this multipart/related onto our stack
            stack.Add (related);
    
            // visit the root document
            root.Accept (this);
    
            // pop this multipart/related off our stack
            stack.RemoveAt (stack.Count - 1);
        }
    
        // look up the image based on the img src url within our multipart/related stack
        bool TryGetImage (string url, out MimePart image)
        {
            UriKind kind;
            int index;
            Uri uri;
    
            if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
                kind = UriKind.Absolute;
            else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
                kind = UriKind.Relative;
            else
                kind = UriKind.RelativeOrAbsolute;
    
            try {
                uri = new Uri (url, kind);
            } catch {
                image = null;
                return false;
            }
    
            for (int i = stack.Count - 1; i >= 0; i--) {
                if ((index = stack[i].IndexOf (uri)) == -1)
                    continue;
    
                image = stack[i][index] as MimePart;
                return image != null;
            }
    
            image = null;
    
            return false;
        }
    
        // called when an HTML tag is encountered
        void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
        {
            if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
                // search for the src= attribute
                foreach (var attribute in ctx.Attributes) {
                    if (attribute.Id == HtmlAttributeId.Src) {
                        MimePart image;
    
                        if (!TryGetImage (attribute.Value, out image))
                            continue;
    
                        if (!embedded.Contains (image))
                            embedded.Add (image);
                    }
                }
            }
        }
    
        protected override void VisitTextPart (TextPart entity)
        {
            TextConverter converter;
    
            if (foundBody) {
                // since we've already found the body, treat this as an
                // attachment
                attachments.Add (entity);
                return;
            }
    
            if (entity.IsHtml) {
                converter = new HtmlToHtml {
                    HtmlTagCallback = HtmlTagCallback
                };
    
                converter.Convert (entity.Text);
            }
    
            foundBody = true;
        }
    
        protected override void VisitTnefPart (TnefPart entity)
        {
            // extract any attachments in the MS-TNEF part
            attachments.AddRange (entity.ExtractAttachments ());
        }
    
        protected override void VisitMessagePart (MessagePart entity)
        {
            // treat message/rfc822 parts as attachments
            attachments.Add (entity);
        }
    
        protected override void VisitMimePart (MimePart entity)
        {
            // realistically, if we've gotten this far, then we can treat
            // this as an attachment even if the IsAttachment property is
            // false.
            attachments.Add (entity);
        }
    }
    

    To use it:

    var visitor = new AttachmentVisitor ();
    
    message.Accept (visitor);
    
    // Now you can use visitor.Attachments and visitor.EmbeddedImages
    

    An even simpler, although less error-proof (sine it doesn't actually verify whether the image is referenced by the HTML), way of doing it is this:

    var embeddedImages = message.BodyParts.OfType<MimePart> ().
        Where (x => x.ContentType.IsMimeType ("image", "*") &&
               x.ContentDisposition != null &&
               x.ContentDisposition.Disposition.Equals ("inline" StringComparison.OrdinalIgnoreCase));
    

    Now that you have your list of embeddedImages, you'll have to figure out a way to determine if they are only used in the signature or used elsewhere in the HTML.

    Most likely you'll have to analyze the HTML itself as well.

    It is also probably worth noting that some HTML mail will reference images located on the web that are not embedded in the MIME of the message. If you want these images as well, you'll need to modify TryGetImage to fall back to downloading the image from the web if the code I provided fails to locate it within the MIME of the message.

    For text/plain messages (which can't use images at all), the common convention to separate the signature from the rest of the message body is a line with only 2 dashes and a space: --.

    From my limited experience with HTML messages that have signatures, they do not appear to follow a similar convention. Looking at a few of the HTML messages I receive from co-workers at Microsoft using Outlook, they appear to be within a <table> at the end of the message. However, this assumes that the message is not a reply. Once you start parsing message replies, this <table> ends up in the middle of the message somewhere because the original message being replied to is at the end.

    Since everyone's signature is different as well, I'm not sure if this <table> similarity is an Outlook convention or if people are manually constructing their signatures and they are all just using tables out of coincidence (I've also only seen a few, most do not use signatures, so my sample size is very small).