Search code examples
phputf-8asciihtml-emailquoted-printable

Random HTML characters being encoded in emails


I'm generating an email with PHP that outputs an HTML table. Most of the table comes through fine, but some of the < and > characters are randomly encoded to &lt; and &gt;. It doesn't always do it in the same place. Sometimes it just happens in one place, sometimes not at all, and sometimes in multiple places.

Here's a code snippet from the middle of my table as my email client sees it. Note the inserted &lt; /tr&gt; that should not be there:

<tr>  
  <td>SERVER_SOFTWARE</td>
  <td>Apache/2.2.29 (Red Hat)</td>
</tr>
<tr>
  <td>SERVER_PROTOCOL</td>
  <td>HTTP/1.1</td>
  &lt; /tr&gt;
</tr>
<tr>
  <td>REQUEST_METHOD</td>
  <td>POST</td>
</tr>

And the same segment in the plaintext part of the email: (again, note that </tr> somehow gets inserted.)

SERVER_SOFTWARE Apache/2.2.29 (Red Hat)
SERVER_PROTOCOL HTTP/1.1 < /tr>
REQUEST_METHOD POST

I'm setting it to UTF-8 in the headers before sending:

$headers  = "MIME-Version: 1.0\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable";

(P.S. I was having the exact same problem earlier using charset=ISO-8859-1.)

But despite this, it is somehow being displayed in US-ASCII:

Content-type: text/html;
    charset="US-ASCII"
Content-transfer-encoding: quoted-printable

The PHP script that's generating the email looks like this:

//generate $table
$indicesServer = array('PHP_SELF', 'argv', 'argc', 'GATEWAY_INTERFACE', 'SERVER_ADDR', 'SERVER_NAME', 'SERVER_SOFTWARE', 'SERVER_PROTOCOL', 'REQUEST_METHOD', 'REQUEST_TIME', 'REQUEST_TIME_FLOAT', 'QUERY_STRING', 'DOCUMENT_ROOT', 'HTTP_ACCEPT', 'HTTP_ACCEPT_CHARSET', 'HTTP_ACCEPT_ENCODING', 'HTTP_ACCEPT_LANGUAGE', 'HTTP_CONNECTION', 'HTTP_HOST', 'HTTP_REFERER', 'HTTP_USER_AGENT', 'HTTPS', 'REMOTE_ADDR', 'REMOTE_HOST', 'REMOTE_PORT', 'REMOTE_USER', 'REDIRECT_REMOTE_USER', 'SCRIPT_FILENAME', 'SERVER_ADMIN', 'SERVER_PORT', 'SERVER_SIGNATURE', 'PATH_TRANSLATED', 'SCRIPT_NAME', 'REQUEST_URI', 'PHP_AUTH_DIGEST', 'PHP_AUTH_USER', 'PHP_AUTH_PW', 'AUTH_TYPE', 'PATH_INFO', 'ORIG_PATH_INFO') ;
$table = '<table cellpadding="3" cellspacing="0" border="1" bordercolor="#bbb">';
foreach ($indicesServer as $arg) {
    if (isset($_SERVER[$arg])) {
        $table .= '<tr><td>'.$arg.'</td><td>' . $_SERVER[$arg] . '</td></tr>' ;
    } else {
        $table .= '<tr><td>'.$arg.'</td><td>-</td></tr>' ;
    }
}
$table .=  '</table>' ;

//set up email
$to = [redacted];
$subject = [redacted];
$email_body = "Heres data:" . $table;
$headers  = "MIME-Version: 1.0\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable";

//send email
mail($to, $subject, $email_body, $headers);

EDIT: I've noticed HTML attributes are getting messed up. It's related to the quoted-printable encoding of equals signs. = is encoded to =3D as expected, but then sometimes the next character is deleted! Thus the following is happening:

<a href="http://example.com"> becomes <a href=3D"ttp://example.com">

<table cellpadding=3 cellspacing=0 border=1> becomes <table cellpadding<ellspacingorder=3D"&lt;tr">


Solution

  • My guess is since that's a closing "tr" that shouldn't be there (you have another right after it), some friendly html parser is "helping" you by changing from being a tag into some normal string.

    Another thought:

    See here: https://support.sendgrid.com/hc/en-us/articles/200182068-HTML-Formatting-Issues

    1. Some mail clients, such as Outlook and Thunderbird, appear to insert double spacing line breaks at every line. The reason is that the 'content-transfer-encoding' in MIME is set to 'quoted-printable' which adds Carriage Return Line Feed (CRLF) line breaks to the source content of the email which are characters interpreted by these mail clients. To alleviate this problem, please do the following:

    a. If you can customize the MIME settings for your email, set the 'Content-Transfer-Encoding' to '7bit' instead of 'Quoted-Printable.'

    b. Ensure that your content follows the line length limits from item 2 above.

    I wonder if something is putting a line break in your tag, causing it to be unreadable, then the browser is adding an extra as a replacement.

    Can you try this: change 'Content-Transfer-Encoding' to '7bit' or leave it out entirely?