Search code examples
phpcharacter-encodingimapiconv

ICONV function and Windows-1252


I have an application for reading email from a webmail and saving the data in a database. I am using the PHP's imap library to do most of the work.

The problem is that most emails have more than one charset (mostly ISO-8859-1 and UTF-8), so I have to read the charset from email and I decode it to ISO-8859-1 using the iconv function.

It works fine for most charsets, but when I read the Windows-1252 charset and try to decode it the iconv function isn't returning anything.

If I try to change the iconv function to the mb_convert_string, it doesn't convert all of the characters correctly.

this is my code:

if( $part->parameters[$i]->attribute == 'charset' )
    $charset =  $part->parameters[$i]->value;

if (strtolower($charset) != 'iso-8859-1')
    $this->emailMessageTxt = iconv($charset, 'iso-8859-1', $this->emailMessageTxt);

Is there an error in there?


Solution

  • Yes, you are trying to convert from any other charset to ISO-8859-1. ISO-8859-1 cannot represent many characters at all, for example it cannot represent the character .

    You should have been working the other way around, converting everything non-UTF-8 to UTF-8, which can represent any character on the planet.

    If you want to ignore characters that cannot be represented, just do:

    $utf8 = "€€€ money"; //My php files are saved in utf-8, don't mind that
    
    $iso8859 = iconv( "UTF-8", "ISO-8859-1//IGNORE", $utf8 );
    
    echo $iso8859; // " money"
    

    That is, convert to "ISO-8859-1//IGNORE"

    Docs:

    out_charset The output charset.

    If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.