I have been trying to solve this problem for about 1.5 hours, but I have been unable to get it to work. I also searched on Google.
The W3C Validator says that my server sends a US-ASCII header.
I wrote
<?xml version="1.0" encoding="utf-8"?>
in the XML.
I have MySQL with German text. The MySQL database is in utf8_unicode_ci
and works as it should: it saves öäü correctly.
Now I want to create an XML file from the data saved in the MySQL database. Everything works fine, but the umlauts (öäü) are not written.
I tried to use
$this->rss_data .=utf8_encode(....
but it didn't work.
$this->rss_data .=utf8_decode(...
also didn't work.
I also tried
fwrite($this->rss_file, utf8_encode($this->rss_data)) or die("Error while writing rss xml");
This also did not work.
The saved text looks like this in my XML
Betriebssysteme sind fü ;r Computer mit hö ;heren
My Firefox browser shows the öäü correctly, but I cannot get a valid RSS 2.0 feed, and so the feed entries don't show.
On your the desktop:
Try to use a program which correctly handles UTF-8 encoding when opening the file. UTF-8 without BOM and ASCII is the same on the lowest code points and some programs are determining the encoding based on a sample which not necessary contains any characters from the higher code points. (Note: Windows' notepad.exe is not the best choice to check the file)
The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well. (http://en.wikipedia.org/wiki/UTF-8)
Another way is to explicitly set the encoding to UTF-8 in the program and check the file with that settings.
Based on your last sentence (My Firefox browser shows the öäü correctly, but I cannot get a valid RSS 2.0 feed, and so the feed entries don't show.
) the encoding is fine, just your program and the server's headers are incorrect.
On the server side:
After you confirmed that the file is in UTF-8 by opening it a program which correctly handles UTF-8 without BOM, you have to check your webserver's configuration (or at least the configuration of your subdomain).
You have to set the encoding for *.xml (or the specific xml) in the headers. If you are using pregenerated files, you have to do this in the domain's or in the server's config.
W3C's Setting charset information in .htaccess article could help.
Specifying by extension
Use the AddCharset directive to associate the character encoding with all files having a particular extension in the current directory and its subdirectories. For example, to serve all files with the extension .html as UTF-8, open the .htaccess file in a plain text editor and type the following line:
AddCharset UTF-8 .html
The extension can be specified with or without a leading dot. You can add multiple extensions to the same line. This will still work if you have file names such as example.en.html or example.html.en.
The example will cause all files with the extension .html to be served as UTF-8. The HTTP Content-Type header will contain a line that ends with the 'charset' information as shown in the example that follows.
Content-Type: text/html; charset=UTF-8
Note: All files with this extension in all subdirectories of the current location will also be served as UTF-8. If, for some reason, you need to serve the odd file with a different encoding you will need to override this using additional directives.
Note: You can associate the character encoding with any extension attached to your file. For example, suppose you do language negotiation and you have pages in two languages that follow the model example.en.html and example.ja.html. Let's also suppose that you are happy to serve English pages using your server's ISO-8859-1 default, but want to serve Japanese files in UTF-8.
Summarizing the comments
If you are using output escaping (htmlentities
, htmlspecialchars
, strip_tags
, etc), please check that these functions are not interfering or called multiple times.
Using htmlentities()
multiple time could lead to undesired results:
htmlentities('Ö') = Ö (Ö in the browser)
htmlentities(htmlentities('Ö')) = &Ouml; (Ö in the browser)