Using PHP against a UTF-8 compliant database. Here's how input goes in.
And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.
But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.
I need to change something in this process. Perhaps I need to change multiple things.
What can I do to not get special characters clobbered?
- textarea encoded with javascript escape()
escape
isn't safe for non-ascii. Use escapeURIComponent
- passed via HTTP post
I assume that you use XmlHttpRequest
? If not, make sure that the page containing the form is served as utf-8.
- decoded with PHP rawurldecode()
If you access the value through $_POST
, you should not decode it, since that has already been done. Doing so will mess up data.
- escaped for MySQL and stored in database
Make sure you don't have magic quotes
turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8
, if you don't use PDO).
Finally, make sure that the page is served as utf-8 when you output the string again.