Search code examples
phpmysqlutf-8character-encodinglatin1

How to retrieve utf-8 data with php and show the correct encoding in an excelsheet db dump?


Hi I am saving mostly english and german characters into a mysql database which currently is set to utf-8 charset.

I am assuming that I should use latin1 charset for this type of data, is that correct?

If so how can I change the charset to correct the german chars which are now saved in utf-8?

UPDATE

Maybe then it is a retrival problem ... When I export data from the db via php of course I get utf-8 back, can I do the retrival to give me latin1?

UPDATE 1

Ok I am building a website, the html encoding is uft-8 the db is uft-8, and now I want to run some exports and extract data, which should be returned in an excel sheet, and the data is utf-8, but here I need the chars to be latin1 ... or the encoding of the excel sheet extracted from the db need to be such that Töst will show Täst. Right now I get the data like this -> Töst

UPDATE 2

I am using following php script to do the dump:

http://www.fundisom.com/phparadise/php/databases/mySQL_to_excel

on line 48 I have changed the code to

header("Content-Type: application/$file_type; charset=utf-8");

no change in behaviour.

How would I solve the issue?

Almost Solution

<?php
$text = "ö is a valid UTF-8 character";
echo 'Original : ', $text, PHP_EOL;
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
echo 'Plain    : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;
?>

this is what I need I think ... but I need to check it in context of the php script... tomorrow :-)


Solution

  • I agree with the previous answers that UTF-8 is a good choice for most applications.

    Beware the traps that might be awaiting you, though! You'll want to be careful that you use a consistent character encoding throughout your system (input forms, output web pages, other front ends that might access or change the data).

    I have spent some unpleasant hours trying to figure out why a simple β or é was mangled on my web page, only to find that something somewhere had goofed up an encoding. I've even seen cases of text that gets run through multiple encoders--once turning a single quotation mark into eight bytes.

    Bottom line, don't assume the correct translation will be done; be explicit about character encoding throughout your project.

    Edit: I see in your update you've already started to discover this particular joy. :)