Search code examples
c#.netemailemail-processing

Is it possible to programmatically 'clean' emails?


Does anyone have any suggestions as to how I can clean the body of incoming emails? I want to strip out disclaimers, images and maybe any previous email text that may be also be present so that I am left with just the body text content. My guess is it isn't going to be possible in any reliable way, but has anyone tried it? Are there any libraries geared towards this sort of thing?


Solution

  • In email, there is couple of agreed markings that mean something you wish to strip. You can look for these lines using regular expressions. I doubt you can't really well "sanitize" your emails, but some things you can look for:

    1. Line starting with "> " (greater than then whitespace) marks a quote
    2. Line with "-- " (two hyphens then whitespace then linefeed) marks the beginning of a signature, see Signature block on Wikipedia
    3. Multipart messages, boundaries start with --, beyond that you need to do some searching to separate the message body parts from unwanted parts (like base64 images)

    As for an actual C# implementation, I leave that for you or other SOers.