Search code examples
c#htmlemailmimekitmime-message

MimeKit Remove gif images from Emails


I'm trying to strip gif images from emails in order to save storage space in Outlook and our document management system.

Say for example you've got an email approx 2MB's in size and the gif is 1MB. I'm expecting the result of the file size of the email to be 1MB.

The first part uses MimeKit to remove the gif. The problem I find with this code is that if you are not debugging it doesn't reduce the file size by what I'd expect. I've found this is because the image is still in the html properties of the MimeMessage.

        MimeMessage mimeMessage = MimeMessage.Load(testFile);
                   
        var bodyParts = mimeMessage.BodyParts.ToList();
        if (bodyParts.Any())
        {
            var multipart = mimeMessage.Body as Multipart;
            if (multipart != null)
            {
                MimeEntity bodyPartToRemove = null;
                foreach (var bodyPart in bodyParts)
                {
                    var mimeBodyPart = bodyPart as MimePart;
                    if (mimeBodyPart == null)
                    {
                        continue;
                    }
                    if (mimeBodyPart.ContentType.MimeType == "image/gif")
                    {
                        bodyPartToRemove = mimeBodyPart;
                    }
                }
                
                if (bodyPartToRemove != null)
                {
                    multipart.Remove(bodyPartToRemove);
                }
            }
            
            mimeMessage.Body = multipart;
        }

So after this I thought I'd use HtmlAgilityPack to remove the img tags from the html and then use the MimeKit.BodyBuilder to set the MimeMessage correctly.

        var builder = new BodyBuilder();

        // Set the plain-text version of the message text
        builder.TextBody = mimeMessage.TextBody;

        // Set the html version of the message text
        builder.HtmlBody = StripHtml(mimeMessage.HtmlBody);
        
        // Attachments
        foreach (var blah in mimeMessage.Attachments)
            builder.Attachments.Add(blah);

        mimeMessage.Body = builder.ToMessageBody();



    private string StripHtml(string html)
    {
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.LoadHtml(html);

        var nodes = htmlDoc.DocumentNode.SelectNodes("//img");

        foreach (var node in nodes)
        {
            if (node.OuterHtml.Contains(".gif"))
                node.Remove();
        }

        return htmlDoc.DocumentNode.InnerHtml;
    }

The problem with this solution is that by using the builder.ToMessageBody() it is not displaying the other non gif images that could be contained in the email as well as rendering other parts of the email correctly like emoji's.

Has anyone come across this before?


Solution

  • You have 2 questions that I'll answer separately.

    Why doesn't the size of the message shrink after I remove the gif attachments?

    MIME can contain nested multiparts and in your case, it likely does because HTML mail with images are often within a multipart/related which is often within a multipart/alternative, like this:

    multipart/mixed
      multipart/alternative
        text/plain   <-- text-only version of the message body
        multipart/related
          text/html  <-- html version of the message body
          image/jpeg <-- an image "embedded" in the html body
          image/gif  <-- a gif image embedded in the html body
      application/pdf  <-- an attachment
    

    MimeMessage.BodyParts recursively lists all MIME parts in the message, which in the example case above would be all of the text parts, image parts and the pdf.

    The problem is that your code assumes that the top-level multipart is the direct parent of all of these parts and it's not, so removing the gif in the above example would be a no-op. This is likely why the size of the message isn't going down as much as you expected it to.

    I should probably have a FAQ about this because it's not likely easy to find the example of how to remove attachments in the MimeIterator documentation. That said, here's a customized version of that code snippet that will do what you need:

    var multiparts = new List<Multipart> ();
    var gifs = new List<MimePart> ();
    
    using (var iter = new MimeIterator (message)) {
        // collect our list of attachments and their parent multiparts
        while (iter.MoveNext ()) {
            var multipart = iter.Parent as Multipart;
            var part = iter.Current as MimePart;
    
            if (multipart != null && part != null && part.ContentType.IsMimeType ("Image", "gif")) {
                // keep track of each gif's parent multipart
                multiparts.Add (multipart);
                gifs.Add (part);
            }
        }
    }
    
    // now remove each gif from its parent multipart...
    for (int i = 0; i < gifs.Count; i++)
        multiparts[i].Remove (gifs[i]);
    

    Why do all of the emojis disappear after using BodyBuilder to replace the message body?

    When you are constructing the BodyBuilder, you add all of the Attachments, sure, but you are not adding any of the inline body parts. The emojis are probably inline body parts and not attachments, so you are losing them.

    If you refer back up to the sample MIME message structure in the previous section of this answer, typically only the body parts contained in the outer multipart/mixed MIME part will be marked as attachment while the images within the multipart/related will be inline because they are meant to be displayed inline (aka embedded) with the message body.

    Before we continue, I should note that the size of the message isn't drastically affected by the <img> tags in the HTML unless they contain the raw image data encoded in base64 or something (which is totally possible, but not likely).

    I probably wouldn't bother tampering with the HTML, but... if you really want to, a quick hack might look something like this:

    var multiparts = new List<Multipart> ();
    var gifs = new List<MimePart> ();
    
    using (var iter = new MimeIterator (message)) {
        // collect our list of attachments and their parent multiparts
        while (iter.MoveNext ()) {
            var multipart = iter.Parent as Multipart;
            var part = iter.Current as MimePart;
    
            if (part == null)
                continue;
    
            if (multipart != null && part.ContentType.IsMimeType ("Image", "gif")) {
                // keep track of each gif's parent multipart
                multiparts.Add (multipart);
                gifs.Add (part);
            } else if (part is TextPart text && text.IsHtml) {
                text.Text = StripHtml (text.Text);
            }
        }
    }
    
    // now remove each gif from its parent multipart...
    for (int i = 0; i < gifs.Count; i++)
        multiparts[i].Remove (gifs[i]);
    

    Finally, don't forget to write the message back out using message.WriteTo(fileName). I mention this because some people seem to assume that things are auto-saved when any changes to the message are made.