I have a legacy database that claims to have collation set to windows-1252 and is storing a text field's contents as
I’d
When it is displayed in a legacy web-app it shows as I’d
in the browser. The browser reports a page encoding of UTF-8. I can't figure out how that conversion has been done (almost certain it isn't via an on-the-fly search-and-replace). This is a problem for me because I am taking the text field (and many others like it) from the legacy database and into a new UTF-8 database. A new web app displays the text from the new database as
I’d
and I would like it to show it as I’d
. I can't figure out how the legacy app could have achieved this (some fiddling in Ruby hasn't showed me a way to affect converting a string I’d
to I’d
).
I've tied myself in a knot here somewhere.
It probably means the previous developer screwed up data insertion (or you're screwing up somewhere). The scenario goes like this:
latin1
latin1
, stores it as such (interprets ’ as ’)latin1
You essentially need to do the same misinterpretation to get good data. Right now you may be querying the database through a utf8
connection, so the database returns ’ encoded in UTF-8. What you need to do is query through a latin1
connection and interpret the data as UTF-8 instead.
See Handling Unicode Front To Back In A Web App for a more detailed explanation of all this.