Search code examples
c#.netflashencodingurl-encoding

Getting U+fffd/65533 instead of special character from Query String


I have a C# .net web project that have a globalization tag set to:

<globalization requestEncoding="utf-8" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>

When this URL a Flash application (you get the same problem when you enter the URL manually in a browser): c_product_search.aspx?search=kjøkken (alternatively: c_product_search-aspx?search=kj%F8kken

Both return the following character codes:

k U+006b 107
j U+006a 106
� U+fffd 65533
k U+006b 107
k U+006b 107
e U+0065 101
n U+006e 110

I don't know too much about character encoding, but it seems that the ø has been given a unicode replacement character, right?

I tried to change the globalization tag to:

<globalization requestEncoding="iso-8859-1" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>

That made the request work. However, now, other searches on my page stopped working.

I also tried the following with similar results:

NameValueCollection qs = HttpUtility.ParseQueryString(Request.QueryString.ToString(), Encoding.GetEncoding("iso-8859-1"));
string search = (string)qs["search"];

What should I do?

Kind Regards,

nitech


Solution

  • The problem comes from the combination Firefox/Asp.Net. When you manually entered a URL in Firefox's address bar, if the url contains french or swedish characters, Firefox will encode the url with "ISO-8859-1" by default.

    But when asp.net recieves such a url, it thinks that it's utf-8 encoded ... And encoded characters become "U+fffd". I couldn't find a way in asp.net to detect that the url is "ISO-8859-1". Request.Encoding is set to utf-8 ... :(

    Several solutions exist :

    • put <globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"/> in your Web.config. But your may comme with other problems, and your application won't be standard anymore (it will not work with languages like japanese) ... And anyway, I prefer using UTF-8 !

    • go to about:config in Firefox and set the value of network.standard-url.encode-query-utf8 to true. It will now work for you (Firefox will encode all your url with utf-8). But not for anybody else ...

    • The least worst solution I could come with was to handle this with code. If the default decoding didn't work, we reparse QueryString with iso8859-1 :

      string query = Request.QueryString["search"];
      if (query.Contains("%ufffd"))
          query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))["search"];
      query = HttpUtility.UrlDecode(query);
      

    It works with hyperlinks and manually-entered url, in french, english, or japanese. But I don't know how it will handle other encodings like ISO8859-5 (russian) ...

    Does anyone have a better solution ?

    This solves only the problem of manually-entered url. In your hyperlinks, don't forget to encode url parameters with HttpUtility.UrlEncode on the server, or encodeURIComponent on the javascript code. And use HttpUtility.UrlDecode to decode it.