Search code examples
phputf-8iso-8859-1

Translating ISO-8859-1 to UTF-8 problem


One of my projects pulls a document from the web and reads it. This document is provided by a third party and will not change (the content will, but formatting and other stuff will not). The problem is that this document includes content copy and pasted from Word, which is UTF-8, however the document is encoded in ISO-8858-1, so these characters get saved to the database as '?'.

If I past over the text, and re-encode it in UTF-8, instead of getting the smartquotes and em dashes, I just get two garbage characters.

How can I convert this ISO-8859-1 document with UTF-8 character back into UTF-8 so it can be displayed as it was originally created?


Solution

  • I found the solution here: PHP: Problems converting "’" character from ISO-8859-1 to UTF-8

    The server claims it's serving up ISO-8859-1, but it's really Windows-1252, which converts to UTF-8 without a problem.