Search code examples
phpmysqlunicodeutf-8cp1252

php & mysql converting non- to unicode


I have characters like these on our web site: Fémnyomó

That is a street address, entered in another language (though I do not know which). Here's the db setup:

mysql 4.1.2log
charset cp1252 West European (latin1)

I'm using PHP but without mbstrings() (though I do no string conversions on this address, just echo).

If I changed the mysql charset from cp1252 to UTF-8 and made sure I used things like header( 'Content-Type: text/html; charset=UTF-8' ); would that improve my situation? Or, is the data hosed because it was saved in the cp1252 charset and there is nothing I can do? The original database was created in 2002 and has been used/expanded since. We've upgraded servers and re-imported dumps but ashamedly I admit to not giving charsets much thought.

If I'm hosed, I'll probably just remove the text in those fields but I'd like to support unicode going forward, so if I issue ALTER database_name DEFAULT CHARACTER SET utf8; will that make sure future multibyte encodings are saved correctly, taking at least storage out of the equation (leaving me to worry about PHP)?

Thanks -


Solution

  • 1) Convert all charsets to UTF8:

    ALTER database_name DEFAULT CHARACTER SET utf8;
    

    2) Issue this before any query on the page:

    mysql_query("set names 'utf8'");
    

    3) Use this header:

    header( 'Content-Type: text/html; charset=UTF-8' );
    

    4) Insert this meta tag:

    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
    

    5) Also, read this: http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html