Search code examples
c#xmlencodingspecial-charactersdecoding

Getting unknown characters while decoding string in c#


I am working on a project where I am submitting my form data as xml and storing it in xml form to my db.

While decoding the xml data in c# I am getting unknown characters. Actually I am saving spanish text for eg " Introduzca texto aquí ".

So in xml I get this 'í' character as %ED and while decoding it using HttpUtility.UrlDecode(formData) I am getting � instead of í.

XML Data before decoding

%3CArrayOfDiary%3E%3CDiary%3E%3CDate%3E03042015%3C/Date%3E%3CSituation%3EIntroduzca%20texto%20aqu%ED%3C/Situation%3E%3CSensation%3EIntroduzca%20texto%20aqu%ED%3C/Sensation%3E%3CConcern%3EIntroduzca%20texto%20aqu%ED%3C/Concern%3E%3CBeliefRating%3E0%3C/BeliefRating%3E%3CAnxietyRating%3E0%3C/AnxietyRating%3E%3C/Diary%3E%0A%20%20%3CArrayOfDiary%3E

Data after decoding

<ArrayOfDiary><Diary><Date>03042015</Date><Situation>Introduzca texto aqu�</Situation><Sensation>Introduzca texto aqu�</Sensation><Concern>Introduzca texto aqu�</Concern><BeliefRating>0</BeliefRating><AnxietyRating>0</AnxietyRating></Diary>
<Diary>
<Date>03042015</Date>
<Situation> Introduzca texto aqu�</Situation>
<Sensation> Introduzca texto aqu�</Sensation>
<Concern> Introduzca texto aqu�</Concern>
<BeliefRating>0</BeliefRating>
<AnxietyRating>0</AnxietyRating>
</Diary>
</ArrayOfDiary>

Please help me. Thanks


Solution

  • Without seeing where the data is coming from, I assume that it has been created with an encoding of ISO-8859-1.

    You can get around the problem by using the appropriate Encoding in UrlDecode:

    Option Infer On
    ' ....
    Dim s = "%3CArrayOfDiary%3E%3CDiary%3E%3CDate%3E03042015%3C/Date%3E%3CSituation%3EIntroduzca%20texto%20aqu%ED%3C/Situation%3E%3CSensation%3EIntroduzca%20texto%20aqu%ED%3C/Sensation%3E%3CConcern%3EIntroduzca%20texto%20aqu%ED%3C/Concern%3E%3CBeliefRating%3E0%3C/BeliefRating%3E%3CAnxietyRating%3E0%3C/AnxietyRating%3E%3C/Diary%3E%0A%20%20%3CArrayOfDiary%3E"
    Dim enc = Encoding.GetEncoding("ISO-8859-1")
    Dim txt = Web.HttpUtility.UrlDecode(s, enc)
    

    To avoid that hassle, you could use <meta charset="utf-8" /> in the <head> section of the web page. You can still have <html lang="es"> if you want to indicate that the page is in Spanish.