Search code examples
phpencodingutf-8imap

Decode email messages to UTF-7 with PHP


I have been trying convert many IMAP message bodies to something more readable (UTF-8 or equivalent). I cannot seem to find an out-of-the box function to work.

Here is an example of what I am trying to decode:

President Trump signed an executive order Thursday tar= geting North Korea=E2=80=99s trading partners, calling it a =E2=80=9Cpowerful=E2= =80=9D new tool aimed at isolating and de-nuclearizing the regime.


More on thi= s: http://www.foxnews.com= /politics/2017/09/21/trump-signs-executive-order-targeting-north-koreas-tra= ding-partners.html

(in the sample above, any "= ", there should be a newline)

A few things that I have tried:

iconv("UTF-8", "Windows-1252//TRANSLIT//IGNORE", $data); 
//this resulted in a server error 500

imap_mime_header_decode($data); 
//this outputs an array (just something that I tried; yes, I know that it is only good for headers)

iconv_mime_decode($test, 0, "ISO-8859-1");
//This works for a few messages (plaintext ones) but does not output anything for the example above; for others, it only outputs part of the message body

mb_convert_encoding($test, "UTF8");
//this results in another internal server error!

$data = str_replace("=92", "'", $data);
//I have also tried to manually find and replace an occurrence of a utf-7 (I guess) encoded string

Anyways, there is something that I am doing totally wrong but not sure what. How do you all read the body of an email retrieved with IMAP?

What are some other things that I can try? People must do just this the entire time but I can't seem to find a solution...

Thank you, Rog


Solution

  • You're not actually dealing with the UTF-7 encoding here. What you're actually seeing is quoted-printable.

    php contains a function to decode this

    I actually haven't written php in quite some time so forgive my style failures, here's an example which decodes your text:

    <?php
    $s = 'President Trump signed an executive order Thursday tar= geting North Korea=E2=80=99s trading partners, calling it a =E2=80=9Cpowerful=E2= =80=9D new tool aimed at isolating and de-nuclearizing the regime.';
    
    // It's unclear why I have to replace out `= `, I have a feeling these
    // are actually newlines and copy paste error?
    echo quoted_printable_decode(str_replace('= ', '', $s));
    ?>
    

    When run it produces:

    President Trump signed an executive order Thursday targeting North Korea’s trading partners, calling it a “powerful” new tool aimed at isolating and de-nuclearizing the regime.