Search code examples
phpmysqlutf-8character-encodingapostrophe

Problem converting ISO8859-1 to UTF-8 in PHP


I am attempting to convert a ISO8859-1 string taken from a MySQL database and convert it to UTF-8 using php. However, when I use the utf8_encode function it removes almost all of the apostrophes from the string (the exceptions seem to be within html fields).

Thanks


Solution

  • Your ‘ISO-8859-1’ content is probably not actually ISO-8859-1.

    When you say Content-Type: text/html; charset=iso-8859-1, browsers don't actually use ISO-8859-1, for annoying historical reasons. They really use Windows code page 1252 (Western European), which is very similar to ISO-8859-1, but not the same.

    In particular, the bytes in the range 0x80-0x9F represent invisible and seldom-used control codes in ISO-8859-1. But cp1252 adds some typographical niceties and other extensions in this range, including the ‘smart quotes’. When you write an apostrophe in MS Word, it changes it to a single left-facing smart-quote , so it's common to have encoding problems with text that was original typed in Word and other Office apps.

    To convert cp1252 to UTF-8 you would have to use iconv('cp1252', 'utf-8', $somestring) rather than utf8_encode which is tied to ‘real’ ISO-8859-1.