I have a folder with inside an XML file like this:
<?xml version="1.0" encoding="UTF-8"?>
<cities>
<result>
<city_id>-3870534</city_id>
<country>mx</country>
<name>Santa Bárbara</name>
<nr_hotels>0</nr_hotels>
<translations>
<language>en-gb</language>
<name>Santa Bárbara</name>
</translations>
<translations>
<language>ru</language>
<name>Санта-Барбара</name>
</translations>
</result>
</cities>
<!-- RUID: [UmFuZG9tSVYkc2RlIyh9YcxtmfhRwqry58sgWYNIgEV1AjdsVswrKUorBoUlR6ylFgiaj5XJ0w0DP0lL/htWqOKtE33w1EhBbLABKokIfEo=] -->
The file looks well formatted, in utf8, it contains Russian terms and symbols like "á" in Santa Bárbara.
I should read this file and create a record in a MySql DB (through C#), but I'm facing encoding problems.
PS: the DB table has a few columns (to store city id, country and city translations), all text fields, utf8_general_ci.
I'm trying the following code to read the files (just one in this case) in a folder
foreach (string file in Directory.EnumerateFiles("C:\xml_folder\"" + sub_folder, "*.xml")) {
Console.WriteLine(file);
string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));
Console.WriteLine(response);
var document = XDocument.Parse(response);
foreach (var child in document.Root.Elements("result")) {
//... code here
String name_it = "";
String name_en = "";
String name_es = "";
String name_fr = "";
String name_de = "";
String name_ru = "";
foreach (var translationsChild in child.Elements("translations"))
{
switch (translationsChild.Element("language").Value)
{
case "it":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_it = Encoding.UTF8.GetString(bytes);
break;
case "en-gb":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_en = Encoding.UTF8.GetString(bytes);
break;
case "es":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_es = Encoding.UTF8.GetString(bytes);
break;
case "fr":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_fr = Encoding.UTF8.GetString(bytes);
break;
case "de":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_de = Encoding.UTF8.GetString(bytes);
break;
case "ru":
bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
name_ru = Encoding.UTF8.GetString(bytes);
Console.WriteLine(name_ru);
break;
}
}
In a few words, I get the file, than I convert it in XML to read all children and save it into the DB.
The problem seems related to the way (encoding) I'm getting the string from the file, I tried conversion in Windows-1252.
string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));
I even tried conversion in utf8
string response = File.ReadAllText(file, System.Text.Encoding.UTF8);
but every time I get (in the debug console and in the DB), this:
Santa Bárbara -\> Santa B?rbara
Санта-Барбара -\> ?????-??????
It looks like a problem related to the way File.ReadAllText(...)
works, encoding is not working at all.
PS: to store data into the DB I use a DML like this:
cmd.CommandText = "INSERT INTO cities (city_id,country,name,nr_hotels,name_it,name_en,name_es,name_fr,name_de,name_ru,last_modified_date) VALUES(@city_id,@country,@name,@nr_hotels,@name_it,@name_en,@name_es,@name_fr,@name_de,@name_ru,@last_modified_date) on duplicate key update city_id=@city_id,country=@country,name=@name,nr_hotels=@nr_hotels,name_it=@name_it,name_en=@name_en,name_es=@name_es,name_fr=@name_fr,name_de=@name_de,name_ru=@name_ru,last_modified_date=@last_modified_date";
Please, can you help me?
thanks in advance
I don't see any sense in converting to a byte array and back. This works properly for me
string response = File.ReadAllText(file, Encoding.UTF8);
var document = XDocument.Parse(response);
foreach (var child in document.Root.Elements("result"))
{
//... code here
String name_en = "";
String name_ru = "";
foreach (var translationsChild in child.Elements("translations"))
{
var name = translationsChild.Element("name").Value;
Console.WriteLine(name);
switch (translationsChild.Element("language").Value)
{
case "en-gb":
name_en = name;
break;
case "ru":
name_ru = name;
break;
}
}
}
output
Santa Bárbara
Санта-Барбара