Search code examples
phpregexemailhtml-email

Display mail content without external sources (images, css, etc.)


I am writing a mail client in PHP for myself. Unfortunately people tend to put third-party things in there especially in spam mails. So I like to avoid that external scripts, images or any kind of data is loaded.

By now I am looking for a solution and hopefully find one with your help. It can be anything from DOM-Manipulation/RegExp over IFrame to HTTP-Headers.

By now I am stuck getting started. Because I try to avoid RegExp and hope for a simple setting in the HTTP-Header or any other kind to avoid connections to third party.


Solution

  • This is not as intended and only a workaround:

    • Parse the DOM and remove all blacklist tags/container completely
    • Parse the DOM and remove all blacklist attributes completely
    • Parse the DOM and turn all node-names into "span" which are not on the whitelist.

    Blacklist could be "script", "embeded", "iframe" and other. Whitelist could be "a", "p", "br" and similar. Blacklist attributes could be "onclick" while whitelist attributes like "href" are allowed.

    This is still incomplete because the style-attribute can contain "background" or other things with an URL to third-party which should be removed too.